Introducing the Adapter for Hadoop/Hive/Impala

Hadoop is a collection of open source products and technologies administered by the Apache Software Foundation. It provides for analysis of both structured and complex data. The core components of Hadoop are Hadoop DFS (Distributed File System) and the MapReduce programming framework. Hadoop is available from Apache, as well as many commercial and community distributions. The leading vendors are Cloudera, Hortonworks, and MapR.

Hadoop is designed to run on Linux and is widely available from many distributors. There is a Windows version available from Microsoft and Hortonworks that runs on Windows Server 2012 R2.

Hive provides a JDBC driver and the Hive Query Language, a SQL-like interface for generating MapReduce programs. Due to the overhead of this process, Hive is slow.

Cloudera Impala adds a real time query capability that shares the metadata layer and query language with Hive. However, it requires CDH 4.1.0 or later which is the Cloudera distribution of Hadoop.

Other vendors provide their own Hadoop distributions with their proprietary SQL engines. This includes Hadapt, Pivotal HD HAWQ, IBM Big SQL, Teradata AsterData SQL-H, MS SQL Server PolyBase, and many others. If you are using one of these vendor products, you can use the Information Builders data adapter for the appropriate product.

The Hive/Impala adapter is JDBC based. The DataMigrator or WebFOCUS server can be run on any platform (including Windows) that connects to the server where Hive or Impala is running.

WebFOCUS