Introducing the Adapter for Apache Hive and Cloudera Impala

Hadoop is a collection of open source products and technologies administered by the Apache Software Foundation. It provides for analysis of both structured and complex data. The core components of Hadoop are Hadoop DFS (Distributed File System) and the MapReduce programming framework. Hadoop is available from Apache, as well as many commercial and community distributions. The leading vendors are Cloudera, Hortonworks, and MapR.

Hadoop is designed to run on Linux and is widely available from many distributors. There is a Windows version available from Microsoft and Hortonworks that runs on Windows Server 2012 R2.

Hive provides a JDBC driver and the Hive Query Language, a SQL-like interface for generating MapReduce programs. While Hive was originally designed for batch processes, it can now be used for reporting and Business Intelligence. For use with WebFOCUS, Hive 0.14 or later is recommended.

Cloudera Impala adds a real time query capability that shares the metadata layer and query language with Hive. It is available on CDH and other selected Hadoop distributions.

Other vendors provide their own Hadoop distributions with their proprietary SQL engines. This includes Hadapt, Pivotal HD HAWQ, IBM Big SQL, MS SQL Server PolyBase, and many others. If you are using one of these vendor products in addition to using the Apache Hive adapter, you can also use the Information Builders data adapter for the appropriate product.

The Hive/Impala adapter is JDBC based. The DataMigrator or WebFOCUS server can be run on any platform (including Windows) that connects to the server where Hive or Impala is running.

WebFOCUS