How to do unit testing of custom RecordReader and InputFormat classes?

后端 未结 2 602
失恋的感觉
失恋的感觉 2021-01-14 14:38

I have developed one map-reduce program. I have written custom RecordReader and InputFormat classes.

I am using MR Unit and <

相关标签:
2条回答
  • 2021-01-14 14:48

    You'll need a test file to be available (i'm assuming your input format extends FileInputFormat). Once you have this you can configure a Configuration object to use the LocalFileSystem (fs.default.name or fs.defaultFS set to file:///). Finally you'll need to define a FileSplit with the path, offset and length of the flie (part of the file).

    // DISCLAIMER: untested or compiled
    Configuration conf = new Configuration(false);
    conf.set("fs.default.name", "file:///");
    
    File testFile = new File("path/to/file");
    FileSplit split = new FileSplit(
           testFile.getAbsoluteFile().toURI().toString(), 0, 
           testFile.getLength(), null); 
    
    MyInputFormat inputFormat = ReflectionUtils.newInstance(Myinputformat.class, conf);
    RecordReader reader = inputFormat.createRecordReader(split, 
           new TaskAttemptContext(conf, new TaskAttemptID()));
    

    Now you can assert the records returned from the reader match that of what you would expect. You should also test (if your file format supports it) changing the offset and length of the split, as well as creating a compressed version of the file.

    0 讨论(0)
  • 2021-01-14 15:05

    thanks to user7610

    compiled and somewhat tested version of the example code from the answer

    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.mapreduce.InputFormat;
    import org.apache.hadoop.mapreduce.RecordReader;
    import org.apache.hadoop.mapreduce.TaskAttemptContext;
    import org.apache.hadoop.mapreduce.TaskAttemptID;
    import org.apache.hadoop.mapreduce.lib.input.FileSplit;
    import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl;
    import org.apache.hadoop.util.ReflectionUtils;
    import java.io.File;
    
    Configuration conf = new Configuration(false);
    conf.set("fs.default.name", "file:///");
    
    File testFile = new File("path/to/file");
    Path path = new Path(testFile.getAbsoluteFile().toURI());
    FileSplit split = new FileSplit(path, 0, testFile.length(), null);
    
    InputFormat inputFormat = ReflectionUtils.newInstance(MyInputFormat.class, conf);
    TaskAttemptContext context = new TaskAttemptContextImpl(conf, new TaskAttemptID());
    RecordReader reader = inputFormat.createRecordReader(split, context);
    
    reader.initialize(split, context);
    
    0 讨论(0)
提交回复
热议问题