Upload data to HDFS with Java API

前端 未结 3 1926
爱一瞬间的悲伤
爱一瞬间的悲伤 2021-01-24 19:19

I\'ve searched for some time now and none of the solutions seem to work for me.

Pretty straightforward - I want to upload data from my local file system to HDFS using th

3条回答
  •  一生所求
    2021-01-24 19:42

    Two things:

    1. If you are creating a Hadoop client, it could be better to add hadoop-client dependency. It includes all the sub-modules required dependencies. https://github.com/apache/hadoop/blob/2087eaf684d9fb14b5390e21bf17e93ac8fea7f8/hadoop-client/pom.xml. Unless the size of the Jar is a concern and if you are very sure that you won't require another dependency.
    2. When you execute a job using hadoop command the class that it is executed is RunJar and not your driver class. Then RunJar executes your job. For more details you can see the code here: https://github.com/apache/hadoop/blob/2087eaf684d9fb14b5390e21bf17e93ac8fea7f8/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/RunJar.java#L139

    If you review the createClassLoader method in the RunJar class, you will notice that several locations are being included in the classpath.

    Then, if you are executing your class directly using the java -jar command you could be ignoring all the other required steps to execute your job in hadoop that hadoop jar are doing.

提交回复
热议问题