Advantages of using NullWritable in Hadoop

前端 未结 3 1596
时光说笑
时光说笑 2021-01-30 11:18

What are the advantages of using NullWritable for null keys/values over using null texts (i.e. new Text(null)). I see the fol

相关标签:
3条回答
  • 2021-01-30 11:45

    I change the run method. and success

    @Override
    public int run(String[] strings) throws Exception {
        Configuration config = HBaseConfiguration.create();  
        //set job name
        Job job = new Job(config, "Import from file ");
        job.setJarByClass(LogRun.class);
        //set map class
        job.setMapperClass(LogMapper.class);
    
        //set output format and output table name
        //job.setOutputFormatClass(TableOutputFormat.class);
        //job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, "crm_data");
        //job.setOutputKeyClass(ImmutableBytesWritable.class);
        //job.setOutputValueClass(Put.class);
    
        TableMapReduceUtil.initTableReducerJob("crm_data", null, job);
        job.setNumReduceTasks(0);
        TableMapReduceUtil.addDependencyJars(job);
    
        FileInputFormat.addInputPath(job, new Path(strings[0]));
    
        int ret = job.waitForCompletion(true) ? 0 : 1;
        return ret;
    }
    
    0 讨论(0)
  • 2021-01-30 11:58

    You can always wrap your string in your own Writable class and have a boolean indicating it has blank strings or not:

    @Override
    public void readFields(DataInput in) throws IOException { 
        ...
        boolean hasWord = in.readBoolean();
        if( hasWord ) {
            word = in.readUTF();
        }
        ...
    }
    

    and

    @Override
    public void write(DataOutput out) throws IOException {
        ...
        boolean hasWord = StringUtils.isNotBlank(word);
        out.writeBoolean(hasWord);
        if(hasWord) {
            out.writeUTF(word);
        }
        ...
    }
    
    0 讨论(0)
  • 2021-01-30 11:59

    The key/value types must be given at runtime, so anything writing or reading NullWritables will know ahead of time that it will be dealing with that type; there is no marker or anything in the file. And technically the NullWritables are "read", it's just that "reading" a NullWritable is actually a no-op. You can see for yourself that there's nothing at all written or read:

    NullWritable nw = NullWritable.get();
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    nw.write(new DataOutputStream(out));
    System.out.println(Arrays.toString(out.toByteArray())); // prints "[]"
    
    ByteArrayInputStream in = new ByteArrayInputStream(new byte[0]);
    nw.readFields(new DataInputStream(in)); // works just fine
    

    And as for your question about new Text(null), again, you can try it out:

    Text text = new Text((String)null);
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    text.write(new DataOutputStream(out)); // throws NullPointerException
    System.out.println(Arrays.toString(out.toByteArray()));
    

    Text will not work at all with a null String.

    0 讨论(0)
提交回复
热议问题