问题
There exists in Mahout a command for create sequence file as bin/mahout seqdirectory -c UTF-8
-i <input address> -o <output address>
. I want use this command as code API.
回答1:
You can do something like this:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path outputPath = new Path("c:\\temp");
Text key = new Text(); // Example, this can be another type of class
Text value = new Text(); // Example, this can be another type of class
SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, outputPath, key.getClass(), value.getClass());
while(condition) {
key = Some text;
value = Some text;
writer.append(key, value);
}
writer.close();
You can find more information here and here
Additionally, you could call the exact same functionality you described from Mahout by using the org.apache.mahout.text.SequenceFilesFromDirectory
Then the call looks something like this:
ToolRunner.run(new SequenceFilesFromDirectory(), String[] args //your parameters);
The ToolRunner
comes from org.apache.hadoop.util.ToolRunner
Hope this was of help.
来源:https://stackoverflow.com/questions/11645294/how-can-i-use-mahouts-sequencefile-api-code