IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based commercial distributions from other vendors such as Cloudera, HortonWorks and MapR. So it was interesting to learn how IBM stacks up against other vendors in the Big Data landscape.
I learned more about this because I had the opportunity to get hands-on with the InfoSphere BigInsights Big Data ecosystem the week of October 7, at an IBM boot camp. My initial impression is that IBM’s technology competes strongly with others in the industry—probably more so for customers who have already invested in other IBM technologies such as PureData System for Analytics, DB2 and Data Stage. The new InfoSphere BigInsights system complements other IBM products and integrates very well.
Listed below are some interesting key features that make IBM stand out from the competition.
Adaptive MapReduce is an IBM term for executing smaller map reduce tasks quickly with low latency scheduling instead of waiting in the regular queue of long running map reduce tasks.
IBM claims processing time of Adaptive MapReduce tasks are reduced due to usage of native C/C++ rather than Java. This is further accomplished by how certain map reduce tasks are executed. Mappers can decide at runtime to take on more work (until it doesn’t make sense anymore).
Thus Workflow Management is achieved by speeding up class of jobs that process small files.
As part of the Big Insights platform, IBM is providing out of the box High Availability with a seamless, automatic and transparent failover for HDFS (Hadoop Distributed File System) NameNode and JobTracker, thereby eliminating administrative intervention and reducing downtime for the cluster.
With GPFS/FPO (General Parallel File System/File Placement Optimizer) support, you get an enterprise-grade Portable Operating System Interface (POSIX) compliant file system that enhances how data is accessed and stored in InfoSphere BigInsights and removes the single point of failure. It also has a snapshot capability at the operating system level.
Text Analytics is used to accurately analyze unstructured and semi-structured textual data. IBM claims its text analytics provides correct answers twice and is 10x faster compared to the alternatives currently available in the market.
Here are some key features of the text analytics module.
This is a powerful, Excel-like platform to explore, manipulate, transform and represent data primarily intended for analysts and requires no prior programming experience.
Behind the scenes BigSheets runs PIG and map reduce scripts to execute data on the underlying Hadoop cluster. Users are able to do joins, filter, unions and various other transformations on data from multiple sources. Final data displays can also be graphical charts. Currently BigSheets supports line, column, bar and pie charts. BigSheets can source data from files (JSON, delimited), all major RDBMS (via JDBC) and Hive.
DataStage provides integration with a broad range of sources.
Big Insights provides a robust integrated security framework.
BigSQL is a software layer that enables users to create tables and query data in BigInsights using familiar standard SQL statements.
The BigSQL query engine supports joins, unions, grouping, common table expressions, and other familiar SQL expressions. Big SQL can read data directly from relational DBMS systems.
Depending on the query, BigSQL can use Hadoop’s MapReduce framework to process various query tasks in parallel or execute the query locally within the BigSQL server on a single node (whichever may be most appropriate for the query). For instance, queries on smaller tables with less data would have unnecessary overhead if the query is going to run map reduce jobs in parallel in the Hadoop system. Instead BigSQL has functionality that queries on one single node as explained above.
My impression during the week as that the above features and functions are impressive. It will be interesting to see if the technology delivers as promised in the real world. We will all be watching.
NOTE: You can also download the entire article in PDF format by clicking here: BigInsightsArticle.