Reading a File in a Hadoop Distributed File System as an In-Memory Table

A data federation example where SAP Sybase IQ reads a file in the Hadoop Distributed File System (HDFS) as an in-memory table.

  1. Create the Java class:
    public class HDFSclient {
    	public static void readFileByLine(String file, ResultSet rset[])
    throws IOException {
    	
    // Set Configuration to point to HDFS NameNode and find input dir
    Configuration conf = new Configuration();
    conf.addResource(new Path(“/home/mymachine/hadoop/conf/core-site.xml”));
    FileSystem fileSystem = FileSystem.get(conf);
    Path path = new Path(file);
    if (!fileSystem.exists(path)); {
    	System.out.println(“File ” + file + “ does not exists”);
    	return;
    }
    
    // Create meta data for the result set
    ResultSetMetaDataImpl rsmd = new ResultSetMetaDataImpl(1);
    rsmd.setColumnType(1, Typs.VARCHAR);
    rsmd.setColumnName(1, ”c1”);
    rsmd.setColumnLabel(1, ”c1”);
    rsmd.setColumnDisplaySize(1, ”c1”);
    rsmd.setTableName(1, ”MyTable”);
    
    // Create ResultSet using the meta data
    ResultSetImpl rs = null;
    try {
    	rs = new ResultSetImpl((ResultSetMetaDataImpl)rsmd);
    	rs.beforeFirst();// Make sure we are at the beginning
    } catch(Exception e) {
    	System.out.println(”Could not create result set.”);
    	System.out.println(e.toString());
    }
    
    // Read files from input dir line by line inserting into rs
    String line;
    DataInputStream in = new DataInputSteam(fileSystem.open(path));
    BufferedReader reader = new BufferedReader(new InputStreamReader(in));
    while ((line = reader.readline()) != null) {
    try {
    		rs.insertRow();// Insert a new row
    		rs.updateString(1,(line));
    } catch(Exception e) {
    	System.out.println(”Could not insert row/data”);
    	System.out.println(e.toString());
    }
    }
    try {
    rs.beforeFirst();// Make sure we are at the beginning
    } catch(Exception e) {
    	System.out.println(e.toString());
    	}
    
    rset[0] = rs; // Assign result set to the 1st of the passé din array.
    
    in.close();
    reader.close();
    fileSystem.close();
    
    }
    }
    }
    
  2. Install the class or the packaged JAR file:
    INSTALL JAVA NEW JAR ‘myjar’ FROM FILE ‘/home/mymachine/UDFs/myjar.jar’;
  3. Create the function:
    CREATE or REPLACE PROCEDURE readFileByLine( IN fileName CHAR(50) )
    RESULT ( c1 VARCHAR(255) )
    EXTERNAL NAME 'example.HDFSclient.readFileByLine(Ljava/lang/String;[Ljava/sql/ResultSet;)V'
    LANGUAGE JAVA;
  4. Execute the function:
    SELECT c1 FROM readFileByLine('/home/mymachine/input/input.txt');