How could I programmatically get all the job tracker and tasktracker information that is displayed by Hadoop in the web interface?

前端 未结 2 1439
名媛妹妹
名媛妹妹 2021-02-11 04:03

I\'m using Cloudera\'s Hadoop distribution CDH-0.20.2CDH3u0. Is there any way I could the information such as jobtracker status, tasktracker status, counters using a JAVA progra

相关标签:
2条回答
  • 2021-02-11 04:38

    I am not sure if this is correct but you can try HUE. I think HUE gives information about jobs. Since its open source you can see how they access job tracker and name tracker.

    0 讨论(0)
  • 2021-02-11 04:43

    You can use the Hadoop API to access this information programmatically. In particular, instantiate JobClient with the suitable configuration for your cluster, and then you can use getJob on that instance to get a RunningJob. With that, you should be able to get to the detail you're looking for (following code is completely untested, but in the direction of the right idea I hope):

    JobClient theJobClient = new JobClient(new InetSocketAddress("your.job.tracker", 8021), new Configuration());
    RunningJob theJob = theJobClient.getJob("job_id_string"); // caution, deprecated
    float mapProgress = theJob.mapProgress(); // similar for reduceProgress
    // etc (see RunningJob)
    

    You can also get the list of currently-running jobs with theJobClient.jobsToComplete, which returns an array of JobStatus, which should expose similar values (mapProgress, etc), and can provide the JobID instance you could use to get the RunningJob above (if you want to avoid the deprecated method).

    Surely there are further options. Start with http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/JobClient.html for further details.

    0 讨论(0)
提交回复
热议问题