Where does job.setOutputKeyClass and job.setOutputReduceClass refers to?

后端 未结 1 1730
心在旅途
心在旅途 2020-12-24 14:44

I thought that they refer to the Reducer but in my program I have

public static class MyMapper extends Mapper< LongWritable, Text, Text, Text &

相关标签:
1条回答
  • 2020-12-24 15:17

    Calling job.setOutputKeyClass( NullWritable.class ); will set the types expected as output from both the map and reduce phases.

    If your Mapper emits different types than the Reducer, you can set the types emitted by the mapper with the JobConf's setMapOutputKeyClass() and setMapOutputValueClass() methods. These implicitly set the input types expected by the Reducer.

    (source: Yahoo Developer Tutorial)

    Regarding your second question, the default InputFormat is the TextInputFormat. This treats each line of each input file as a separate record, and performs no parsing. You can call these methods if you need to process your input in a different format, here are some examples:

    InputFormat             | Description                                      | Key                                      | Value
    --------------------------------------------------------------------------------------------------------------------------------------------------------
    TextInputFormat         | Default format; reads lines of text files        | The byte offset of the line              | The line contents
    KeyValueInputFormat     | Parses lines into key, val pairs                 | Everything up to the first tab character | The remainder of the line
    SequenceFileInputFormat | A Hadoop-specific high-performance binary format | user-defined                             | user-defined
    

    The default instance of OutputFormat is TextOutputFormat, which writes (key, value) pairs on individual lines of a text file. Some examples below:

    OutputFormat             | Description
    ---------------------------------------------------------------------------------------------------------
    TextOutputFormat         | Default; writes lines in "key \t value" form
    SequenceFileOutputFormat | Writes binary files suitable for reading into subsequent MapReduce jobs
    NullOutputFormat         | Disregards its inputs
    

    (source: Other Yahoo Developer Tutorial)

    0 讨论(0)
提交回复
热议问题