2015年5月21日 星期四

Process SequenceFile without Enabling Hadoop Platform

Recently I got a requirement for reading Hadoop’s SequenceFile without enabling Hadoop Platform. However, most examples introduce the read/write SequenceFile with Hadoop Platform. How do I read such files without hadoop?
There’s a tricky solution in this case.
1. Download Hadoop binary file from hadoop site. For Linux/Unix please directly download it; for Windows, there’s pre-built archive file – hadoop-common-2.2.0-bin (source code is here) , created by Abhijit Ghosh.
2. Set environment variable HADOOP_HOME by the directory path (suppose the directory is /usr/local/hadoop in Unix ; or C:/hadoop-common-2.2.0-bin in Windows)
3. Append $HADOOP_HOME/bin to the end of environment variable PATH. ( i.e. /usr/local/hadoop/bin in Unix ; or C:/hadoop-common-2.2.0-bin/bin in Windows
4. Write your program like this (Notice that you have to download hadoop-common-2.2+):

import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
import java.net.URLEncoder;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.util.ReflectionUtils;

public class ProcessSequenceFile {

public static void readSequenceFile(String sequenceFileName) throws IOException, URISyntaxException {
Configuration conf = new Configuration();
String directoryPath = "file:///";
directoryPath = URLEncoder.encode(directoryPath, "UTF-8");
FileSystem fs = FileSystem.get(new URI(directoryPath),conf);
Path file = new Path(fs.getUri().toString() + sequenceFileName);
@SuppressWarnings("deprecation")
SequenceFile.Reader reader = new SequenceFile.Reader(fs, file, conf);
Text key = (Text) ReflectionUtils.newInstance(reader.getKeyClass(), conf);
Text value = (Text) ReflectionUtils.newInstance(reader.getValueClass(), conf);
while(reader.next(key,value)) {
System.out.println("Key:" + key);
System.out.println("=================");
System.out.println(value);

}

}

public static void main(String[] args) {
try {
// If args[0] is the SequenceFile we need to read.
readSequenceFile(args[0]);
} catch (IOException | URISyntaxException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}

Enjoy it!

Reference:
1. Gnosis Runmination, “WIN7下运行hadoop程序报:Failed to locate the winutils binary in the hadoop binary path.” Available: [Online] http://www.cnblogs.com/zq-inlook/p/4386216.html
2. StackOverFlow, “Running Apache Hadoop 2.1.0 on Windows”. Available: [Online] http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows
3.  Abhijit Ghosh, "ERROR util.Shell: Failed to locate the winutils binary in the hadoop binary path," SrcCodes.com. Available: [Online] http://www.srccodes.com/p/article/39/error-util-shell-failed-locate-winutils-binary-hadoop-binary-path
4. Hadoop, “Native Libraries Guide.” Available: [Online] https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/NativeLibraries.html#Native_Hadoop_Library

沒有留言:

張貼留言