saving json data in hdfs in hadoop

不想你离开。 提交于 2019-12-06 02:11:21

If you just want to write a list of JSON objects to HDFS without caring about the notion of key/value, you could just use a NullWritable in your Reducer output value:

public static class TokenCounterReducer extends Reducer<Text, Text, Text, NullWritable> {
    public void reduce(Text key, Iterable<Text> values, Context context)
            throws IOException, InterruptedException {
        for (Text value : values) {
            JSONObject jsn = new JSONObject();
            ....
            context.write(new Text(jsn.toString()), null);
        }
    }
}

Note that you will need to change your job configuration to do:

job.setOutputValueClass(NullWritable.class);

By writing your JSON object to HDFS I understood that you want to store a String representation of your JSON which I'm describing above. If you wanted to store a binary representation of your JSON into HDFS you would need to use a SequenceFile. Obviously you could write your own Writable for this but I feel it's just easier like this if you intend to have a simple String representation.

You can use Hadoop's OutputFormat interfaces to create your custom formats which will write the data as per your wish. For instance if you need data to be written as a JSON object then you could do this :

public class JsonOutputFormat extends TextOutputFormat<Text, IntWritable> {
    @Override
    public RecordWriter<Text, IntWritable> getRecordWriter(
            TaskAttemptContext context) throws IOException, 
                  InterruptedException {
        Configuration conf = context.getConfiguration();
        Path path = getOutputPath(context);
        FileSystem fs = path.getFileSystem(conf);
        FSDataOutputStream out = 
                fs.create(new Path(path,context.getJobName()));
        return new JsonRecordWriter(out);
    }

    private static class JsonRecordWriter extends 
          LineRecordWriter<Text,IntWritable>{
        boolean firstRecord = true;
        @Override
        public synchronized void close(TaskAttemptContext context)
                throws IOException {
            out.writeChar('{');
            super.close(null);
        }

        @Override
        public synchronized void write(Text key, IntWritable value)
                throws IOException {
            if (!firstRecord){
                out.writeChars(",\r\n");
                firstRecord = false;
            }
            out.writeChars("\"" + key.toString() + "\":\""+
                    value.toString()+"\"");
        }

        public JsonRecordWriter(DataOutputStream out) 
                throws IOException{
            super(out);
            out.writeChar('}');
        }
    }
}

And if you do not want to have the key in your output just emit null, like :

context.write(NullWritable.get(), new IntWritable(sum));

HTH

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!