Writing to HDFS from Java, getting “could only be replicated to 0 nodes instead of minReplication”

后端未结

关注

 11  883

I’ve downloaded and started up Cloudera\'s Hadoop Demo VM for CDH4 (running Hadoop 2.0.0). I’m trying to write a Java program that will run from my windows 7 machine (The same

相关标签:

11条回答

北荒

2021-02-01 19:03

From error message replication factor seems to be fine i.e.1. It Seems datanode is properly functioning or have permission issues. Check the permissions and check the status of datanode form the user, you are trying to run hadoop.

0 讨论(0)
发布评论:

提交评论
- 加载中...

悲&欢浪女

2021-02-01 19:07

Since I found many questions like this one in my search for having the exact same issue I thought I would share what finally worked for me. I found this forum post on Hortonworks: https://community.hortonworks.com/questions/16837/cannot-copy-from-local-machine-to-vm-datanode-via.html

The answer was truly understanding what calling new Configuration() means and setting the correct parameters as I needed them. In my case it was exactly the one mentioned in that post. So my working code looks like this.

try {
    Configuration config = new Configuration();
    config.set("dfs.client.use.datanode.hostname", "true");
    Path pdFile = new Path("stgicp-" + pd);
    FileSystem dFS = FileSystem.get(new URI("hdfs://" + HadoopProperties.HIVE_HOST + ":" + HadoopProperties.HDFS_DEFAULT_PORT), config, 
            HadoopProperties.HIVE_DEFAULT_USER);
    if (dFS.exists(pdFile)) {
        dFS.delete(pdFile, false);
    } 
    FSDataOutputStream outStream = dFS.create(pdFile);
    for (String sjWLR : processWLR.get(pd)) {
        outStream.writeBytes(sjWLR);
    }     
    outStream.flush();
    outStream.close();

    dFS.delete(pdFile, false);
    dFS.close();
} catch (IOException | URISyntaxException | InterruptedException e) {
    log.error("WLR file processing error: " + e.getMessage());
}

0 讨论(0)

清酒与你

2021-02-01 19:10

It appears to be some issue with the FS. Either the parameters in cross-site.xml are not matching the file it is trying to read

OR

there is some common mismatch in the path (I see there being a WINDOWS reference).

you can use cygwin tool to setup the path and place it where the datanodes and temp file locations are placed and that should sufficiently do the trick Location : $/bin/cygpath.exe

P.S. Replication does NOT seem to be the primary issue here according to me

0 讨论(0)
发布评论:

提交评论
- 加载中...
慢半拍i

2021-02-01 19:16

I had a similar problem, in my case I just emptied the following folder ${hadoop.tmp.dir}/nm-local-dir/usercache/{{hdfs_user}}/appcache/

0 讨论(0)
发布评论:

提交评论
- 加载中...
[愿得一人]

2021-02-01 19:17
add given property in hdfs-site.xml
```
<property>
   <name>dfs.replication</name>
   <value>1</value>
 </property>
```
and add this file also in your program
```
conf.addResource("hdfs-site.xml");
```
stop hadoop
```
stop-all.sh
```
then start
```
start-all.sh
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
灰色年华

2021-02-01 19:18
I ran into the similar issue and have two pieces of information may help you.
1. The first thing I realized is I was using ssh tunnel to access the name node and when the client code tries to access data node it can not find the data node due to the tunnel somehow messed up the communication. I then run the client on the same box as the hadoop name node and it solved the problem. In short, non-standard network configuration confused hadoop to find the data node.
2. The reason I used ssh tunnel is I can't access name node remotely and I thought it was due to port restriction by admin, so I used ssh tunnel to bypass the restriction. But it turns out to be a misconfiguration of hadoop.
In core-site.xml after I changed
```
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
```
to
```
<value>hdfs://host_name:9000</value>
```
I no longer need the ssh turnnel and I can access the hdfs remotely.
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页