I have developed one map-reduce program. I have written custom RecordReader
and InputFormat
classes.
I am using MR Unit
and <
You'll need a test file to be available (i'm assuming your input format extends FileInputFormat). Once you have this you can configure a Configuration object to use the LocalFileSystem (fs.default.name
or fs.defaultFS
set to file:///). Finally you'll need to define a FileSplit with the path, offset and length of the flie (part of the file).
// DISCLAIMER: untested or compiled
Configuration conf = new Configuration(false);
conf.set("fs.default.name", "file:///");
File testFile = new File("path/to/file");
FileSplit split = new FileSplit(
testFile.getAbsoluteFile().toURI().toString(), 0,
testFile.getLength(), null);
MyInputFormat inputFormat = ReflectionUtils.newInstance(Myinputformat.class, conf);
RecordReader reader = inputFormat.createRecordReader(split,
new TaskAttemptContext(conf, new TaskAttemptID()));
Now you can assert the records returned from the reader match that of what you would expect. You should also test (if your file format supports it) changing the offset and length of the split, as well as creating a compressed version of the file.
thanks to user7610
compiled and somewhat tested version of the example code from the answer
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.InputFormat;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.hadoop.mapreduce.TaskAttemptID;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;
import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl;
import org.apache.hadoop.util.ReflectionUtils;
import java.io.File;
Configuration conf = new Configuration(false);
conf.set("fs.default.name", "file:///");
File testFile = new File("path/to/file");
Path path = new Path(testFile.getAbsoluteFile().toURI());
FileSplit split = new FileSplit(path, 0, testFile.length(), null);
InputFormat inputFormat = ReflectionUtils.newInstance(MyInputFormat.class, conf);
TaskAttemptContext context = new TaskAttemptContextImpl(conf, new TaskAttemptID());
RecordReader reader = inputFormat.createRecordReader(split, context);
reader.initialize(split, context);