问题
I am running Cascading (actually Scalding) hadoop job that uses DistributedCache for dependent jars.
Fist time it works fine (meaning that the classpath is set up correctly) but then it starts failing with ClassNotFoundException:
java.io.IOException: Split class cascading.tap.hadoop.io.MultiInputSplit not found
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:387)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.ClassNotFoundException: cascading.tap.hadoop.io.MultiInputSplit
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:385)
...
Did anybody else have success with Cascading and jars in the DistributedCache
This message seems to imply that Cascading has some internal handling of the distributed cache jars. Any light you can shed on this?
Edit: I am using Cascading 2.1.6 on Hadoop 1.0.3
回答1:
Which version of hadoop are you using? There are some problems with the distributed cache in 0.20.2. Can you try switching to a newer version?
回答2:
Chris K Wensel, the author of Cascading responded on the mailing list that Cascading does not do anything with DistributedCache.
I looked further and it was a problem in my code -- I did not add these files to the DistributedCache properly.
来源:https://stackoverflow.com/questions/17861614/cascading-libjars-classnotfoundexception-sometimes