I\'m using Cloudera\'s Hadoop distribution CDH-0.20.2CDH3u0. Is there any way I could the information such as jobtracker status, tasktracker status, counters using a JAVA progra
I am not sure if this is correct but you can try HUE. I think HUE gives information about jobs. Since its open source you can see how they access job tracker and name tracker.
You can use the Hadoop API to access this information programmatically. In particular, instantiate JobClient
with the suitable configuration for your cluster, and then you can use getJob
on that instance to get a RunningJob
. With that, you should be able to get to the detail you're looking for (following code is completely untested, but in the direction of the right idea I hope):
JobClient theJobClient = new JobClient(new InetSocketAddress("your.job.tracker", 8021), new Configuration());
RunningJob theJob = theJobClient.getJob("job_id_string"); // caution, deprecated
float mapProgress = theJob.mapProgress(); // similar for reduceProgress
// etc (see RunningJob)
You can also get the list of currently-running jobs with theJobClient.jobsToComplete
, which returns an array of JobStatus
, which should expose similar values (mapProgress
, etc), and can provide the JobID
instance you could use to get the RunningJob
above (if you want to avoid the deprecated method).
Surely there are further options. Start with http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/JobClient.html for further details.