问题
I have troubles to make my HDFS setup work in docker swarm. To understand the problem I've reduced my setup to the minimum :
- 1 physical machine
- 1 namenode
- 1 datanode
This setup is working fine with docker-compose, but it fails with docker-swarm, using the same compose file.
Here is the compose file :
version: '3'
services:
namenode:
image: uhopper/hadoop-namenode
hostname: namenode
ports:
- "50070:50070"
- "8020:8020"
volumes:
- /userdata/namenode:/hadoop/dfs/name
environment:
- CLUSTER_NAME=hadoop-cluster
datanode:
image: uhopper/hadoop-datanode
depends_on:
- namenode
volumes:
- /userdata/datanode:/hadoop/dfs/data
environment:
- CORE_CONF_fs_defaultFS=hdfs://namenode:8020
To test it, I have installed an hadoop client on my host (physical) machine with only this simple configuration in core-site.xml :
<configuration>
<property><name>fs.defaultFS</name><value>hdfs://0.0.0.0:8020</value></property>
</configuration>
Then I run the following command :
hdfs dfs -put test.txt /test.txt
With docker-compose (just running docker-compose up) it's working and the file is written in HDFS.
With docker-swarm, I'm running :
docker swarm init
docker stack deploy --compose-file docker-compose.yml hadoop
Then when all services are up, I put my file on HDFS it fails like this :
INFO hdfs.DataStreamer: Exception in createBlockOutputStream
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/x.x.x.x:50010]
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534)
at org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:259)
at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1692)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1648)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:704)
18/06/14 17:29:41 WARN hdfs.DataStreamer: Abandoning BP-1801474405-10.0.0.4-1528990089179:blk_1073741825_1001
18/06/14 17:29:41 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[10.0.0.6:50010,DS-d7d71735-7099-4aa9-8394-c9eccc325806,DISK]
18/06/14 17:29:41 WARN hdfs.DataStreamer: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /test.txt._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
If I look in the web UI the datanode seems to be up and no issue is reported...
Update : it seems that dependsOn is ignored by swarm, but it does not seem to be the cause of my problem : I've restarted the datanode when the namenode is up but it did not work better.
Thanks for your help :)
回答1:
The whole mess stems from interaction between docker swarm using overlay networks and how the HDFS name node keeps track of its data nodes. The namenode records the datanode IPs/hostnames based the datanode's overlay network IPs. When the HDFS client asks for read/write operations directly on the datanodes, the namenode reports back the IPs/hostnames of the datanodes based on the overlay network. Since the overlay network is not accessible to the external clients, any rw operations will fail.
The final solution (after lots of struggling to get overlay network to work) I used was to have the HDFS services use the host network. Here's a snippet from the compose file:
version: '3.7'
x-deploy_default: &deploy_default
mode: replicated
replicas: 1
placement:
constraints:
- node.role == manager
restart_policy:
condition: any
delay: 5s
services:
hdfs_namenode:
deploy:
<<: *deploy_default
networks:
hostnet: {}
volumes:
- hdfs_namenode:/hadoop-3.2.0/var/name_node
command:
namenode -fs hdfs://${PRIMARY_HOST}:9000
image: hadoop:3.2.0
hdfs_datanode:
deploy:
mode: global
networks:
hostnet: {}
volumes:
- hdfs_datanode:/hadoop-3.2.0/var/data_node
command:
datanode -fs hdfs://${PRIMARY_HOST}:9000
image: hadoop:3.2.0
volumes:
hdfs_namenode:
hdfs_datanode:
networks:
hostnet:
external: true
name: host
来源:https://stackoverflow.com/questions/50861281/how-to-make-hdfs-work-in-docker-swarm