I\'ve searched for some time now and none of the solutions seem to work for me.
Pretty straightforward - I want to upload data from my local file system to HDFS using th
Two things:
hadoop
command the class that it is executed is RunJar
and not your driver class. Then RunJar executes your job. For more details you can see the code here: https://github.com/apache/hadoop/blob/2087eaf684d9fb14b5390e21bf17e93ac8fea7f8/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/RunJar.java#L139If you review the createClassLoader
method in the RunJar
class, you will notice that several locations are being included in the classpath.
Then, if you are executing your class directly using the java -jar command you could be ignoring all the other required steps to execute your job in hadoop that hadoop jar are doing.
Kasa, you need to use the method
public static FileSystem get(URI uri,Configuration conf)
to get fs
, the uri params is necessary if you use java -jar
command.
i am not sure about the approach you are following, but below is one way data can be uploaded to hdfs using java libs :
//imports required
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
//some class here .....
Configuration conf = new Configuration();
conf.set("fs.defaultFS", <hdfs write endpoint>);
FileSystem fs = FileSystem.get(conf);
fs.copyFromLocalFile(<src>, <dst>);
Also if you have hadoop conf xmls locally, you can include them in you class path. Then hadoop fs details will automatically be picked up at runtime, and you will not need to set "fs.defaultFS" . Also if you are running in old hdfs version you might need to use "fs.default.name" instead of "fs.defaultFS". If you are not sure of the hdfs endpoint, it is usually the hdfs namenode url . Here is example from previous similar question copying directory from local system to hdfs java code