Hadoop Integration

SAP Sybase IQ includes a UDF API that you can use to build MapReduce components, which can be used for Hadoop integration. The SAP Sybase solutions store has examples of Hadoop integration.

The MapReduce programming model is designed for massively parallel distributed computing. The MapReduce programming model consists of two main stages:

Map stage – The leader node divides a problem into subproblems or maps. These maps must be independent of each other and are executed in parallel.
Reduce stage – The leader node collects the answers of the subproblems and combines them in a meaningful way to get the answer to the original problem.

Apache Hadoop is a MapReduce implementation. Hadoop is a Java software framework that automates scheduling of map and reduce jobs.

SAP Sybase IQ supports Hadoop-like parallel scheduling using Table Parameterized Functions (TPFs), a class of external user-defined functions. TPFs accept arbitrary rowsets of table-valued input parameters, and can be parallized in a distributed server environment. You can specify partitioning and ordering requirements on the TPF input. As a developer, you can use TPFs to exploit the MapReduce paradigm from within the database server, using SQL.

For TPF fundamentals, see the User-Defined Functions guide.

Integrating SAP Sybase IQ with a Hadoop Distributed File System
The data returned from a Hadoop analysis can be integrated into an SAP Sybase IQ database in several ways.
Reading a File in a Hadoop Distributed File System as an In-Memory Table
A data federation example where SAP Sybase IQ reads a file in the Hadoop Distributed File System (HDFS) as an in-memory table.
Starting an External Hadoop MapReduce Job and Using Results in a Query
Define the map and reduce methods to input and output data structured in <key, value> pairs.

Parent topic: Using In-Database Analytics in Applications