I am trying to exploit hadoop to train multiple models . My data are small enough to fit in memory so i want to have one model trained in every map task.
My problem is that when i have finished training my model, i need to send it to the reducer. I am using Weka to train the model. I don't want to start looking how to implement the Writable interface in Weka classes, because it needs a lot of effort. I am looking for a simple way to do this.
The Classifier class in Weka implements the Serializable interface. How can i send this object to the reducer?
edits
Here is the link that mentions weka objects serialization: http://weka.wikispaces.com/Serialization
Here is what my code looks like: Configuring the job(only a part of the configuration is posted):
conf.set("io.serializations","org.apache.hadoop.io.serializer.JavaSerialization," + "org.apache.hadoop.io.serializer.WritableSerialization");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Classifier.class);
Map function:
//load dataset in data variable
Classifier tree=new J48();
tree.buildClassifier();
context.write(new Text("whatever"), tree);
My Map class extends Mapper (Object,Text,Text,Classifier)
But i am getting this error:
java.lang.NullPointerException
at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:964)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:673)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)
What i am doing wrong??
You can define your own serialization mechanism
- http://www.lexemetech.com/2008/07/rpc-and-serialization-with-hadoop.html
- https://issues.apache.org/jira/browse/HADOOP-1986
I think it resolves around implementing the Serialization interface, and defining your implementation in the io.serializations
configuration property
In your case, if you just want to use java serialization, set this property to:
org.apache.hadoop.io.serializer.JavaSerialization
来源:https://stackoverflow.com/questions/9913626/hadoop-easy-way-to-have-object-as-output-value-without-writable-interface