I would like to submit MapReduce jobs from a java web application to a remote Hadoop cluster but am unable to specify which user the job should be submitted for. I would lik
Finally I stumbled on the constant
static final String HADOOP_USER_NAME = "HADOOP_USER_NAME";`
in the UserGroupInformation class
.
Setting this either as an environment variable, as a Java system property on startup (using -D
) or programmatically with System.setProperty("HADOOP_USER_NAME", "hduser");
makes Hadoop use whatever username you want for connecting to the remote Hadoop cluster.
The code below works for me the same as
System.setProperty("HADOOP_USER_NAME", "hduser")
UserGroupInformation ugi = UserGroupInformation.createRemoteUser("hduser");
ugi.doAs(new PrivilegedExceptionAction<Void>() {
public Void run() throws Exception {
Configuration configuration = new Configuration();
configuration.set("hadoop.job.ugi", "hduser");
int res = ToolRunner.run(configuration, new YourTool(), args);
return null;
}
});
I am able to resolve similar issue by using secure impersonation feature http://hadoop.apache.org/docs/stable1/Secure_Impersonation.html
following is code snippet
UserGroupInformation ugi = UserGroupInformation.createProxyUser("hduser", UserGroupInformation.getLoginUser());
ugi.doAs(new PrivilegedExceptionAction() {
public Void run() throws Exception {
Configuration jobconf = new Configuration();
jobconf.set("fs.default.name", "hdfs://server:hdfsport");
jobconf.set("hadoop.job.ugi", "hduser");
jobconf.set("mapred.job.tracker", "server:jobtracker port");
String[] args = new String[] { "data/input", "data/output" };
ToolRunner.run(jobconf, WordCount.class.newInstance(), args);
return null;
} });
The remote (windows desktop host in my case) login user id should be added in core-site.xml as mentioned in above mentioned URL