I have my Data in a CSV file. I want to read the CSV file which is in HDFS.
Can anyone help me with the code??
I'm new to hadoop. Thanks in Advance.
The classes required for this are FileSystem, FSDataInputStream and Path. Client should be something like this :
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
Configuration conf = new Configuration();
conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/core-site.xml"));
conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/hdfs-site.xml"));
FileSystem fs = FileSystem.get(conf);
FSDataInputStream inputStream = fs.open(new Path("/path/to/input/file"));
System.out.println(inputStream.readChar());
}
FSDataInputStream has several read
methods. Choose the one which suits your needs.
If it is MR, it's even easier :
public static class YourMapper extends
Mapper<LongWritable, Text, Your_Wish, Your_Wish> {
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
//Framework does the reading for you...
String line = value.toString(); //line contains one line of your csv file.
//do your processing here
....................
....................
context.write(Your_Wish, Your_Wish);
}
}
}
If you want to use mapreduce you can use TextInputFormat to read line by line and parse each line in mapper's map function.
Other option is to develop (or find developed) CSV input format for reading data from file.
There is one old tutorial here http://hadoop.apache.org/docs/r0.18.3/mapred_tutorial.html but logic is same in new versions
If you are using single process for reading data from file it is same as reading file from any other file system. There is nice example here https://sites.google.com/site/hadoopandhive/home/hadoop-how-to-read-a-file-from-hdfs
HTH
来源:https://stackoverflow.com/questions/17145463/how-to-read-a-csv-file-from-hdfs