How to connect to remote HDFS

江枫思渺然 提交于 2019-12-25 03:52:43

问题


i am trying to connect to an HDFS instance running on a remote machine.

I am running eclipse on a windows machine and the HDFS is running on a Unix box. Here is what i have tried

         Configuration conf = new Configuration();
         conf.set("fs.defaultFS", "hdfs://remoteHostName:portNumber");
         DFSClient client = null;
         System.out.println("try");
         try 
         {
             System.out.println("trying");   
             client = new DFSClient(conf);

        System.out.println(client);
         } 
         catch (IOException e) {

             e.printStackTrace();
        }

         finally {
             if(client!=null)
                 try {
                    client.close();
                } catch (IOException e) {

                    e.printStackTrace();
                }


         }

but this gives me the following exception

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.ipc.RPC.getProxy(Ljava/lang/Class;JLjava/net/InetSocketAddress;Lorg/apache/hadoop/security/UserGroupInformation;Lorg/apache/hadoop/conf/Configuration;Ljavax/net/SocketFactory;ILorg/apache/hadoop/io/retry/RetryPolicy;Z)Lorg/apache/hadoop/ipc/VersionedProtocol;
    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:135)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:280)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:245)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:235)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:226)

by the way, i got the portNumber from the hdfs-site.xml on the remote machine

Is this approach correct?

Also, would it be easier to do this in Python?

EDIT

Note that i do have the Hadoop binaries unzipped on my windows and i have set the HADOOP_HOME environment variable accordingly. Could this be causing a problem?


回答1:


See: Hadoop 2.6.0 Browsing filesystem Java for your specific problem.

Beyond that, you might want to consider using REST for remote interactions. Apache Knox can provide you with access to the remote cluster and shield your code from having to know cluster internals such as host:port, kerberos or not, etc. These things can change out from under your remote clients.



来源:https://stackoverflow.com/questions/33610916/how-to-connect-to-remote-hdfs

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!