I\'ve set up a 3 node Apache Hadoop cluster. On master node, I can see
[hadoop-conf]$ jps
16856 DataNode
17051 SecondaryNameNode
16701 NameNode
21601 Resour
I added the following to yarn-site.xml on all nodes including NameNode (assuming it will be used as well):
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>{Enter NameNode IP Address}:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>{Enter NameNode IP Address}:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>{Enter NameNode IP Address}:8040</value>
</property>
Ideally yes, the slave nodes are not part of your cluster. Probably because of incorrect cluster setup. But just to be sure run the following command in your shell
hdfs dfsadmin -report
You should be able to get the stats of the data nodes.
Problem solved. Some configuration should be done in yarn-site.xml to let the nodemanager know where is the resource manager. Specifically, I added this property into yarn-site.xml:
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
Reason: the default value in yarn-default.xml is 0.0.0.0, and many properties use this hostname to contact resource manager, such as
<property>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
Answer credits: https://stackoverflow.com/a/22125279/3209177