Do users need to exist across all nodes to be recognized by the hadoop cluster / HDFS?

旧巷老猫 提交于 2020-04-07 09:22:07

问题


In MapR hadoop, in order for a user to be able to access HDFS or use YARN for programs, they needed to exist across all nodes in the cluster (with same uid and gid), this includes client nodes that don't act as either data nodes or control nodes (MapR does not really have the concept of namenodes). Is this the same for Hortonworks HDP?


回答1:


Found this answer on the Hortonworks community site:

User should not have account on all the nodes of the cluster. He should only have account on edge node.

For a new user there are 2 types are directories we need to create before the user access the cluster.

1- User home directory [directory created on Linux Filesystem ie. /home/]

2- User HDFS directory [directory created on HDFS filesystem ie. /user/]

...you only need to create HDFS home directory[ie. /user/] on edge node [not sure the meaning here since HDFS does not seem to have anything to do with any particular edge node]. You can still run jobs with the new user on cluster, even if you haven't created his home directory in linux.

** Update: Based on comments by user @cricket_007, it appears that the user must also exist on the namenode server as well. The closest I could find to docs explicitly stating this says:

Each file or directory operation passes the full path name to the NameNode, and the permissions checks are applied along the path for each operation. The client framework will implicitly associate the user identity with the connection to the NameNode, reducing the need for changes to the existing client API. [...] For instance, when the client first begins reading a file, it makes a first request to the NameNode to discover the location of the first blocks of the file.



来源:https://stackoverflow.com/questions/57319080/do-users-need-to-exist-across-all-nodes-to-be-recognized-by-the-hadoop-cluster

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!