问题
When we run data intensive job over Hadoop. Hadoop executes the job. Now what i want is when the job is completed. it will give me the statistics regarding executed job i.e; time consumed, mapper quantity, reducer quantity and other useful information.
The information displayed in browser like job tracker, data node during the job execution. But how can i get the statistics in my application which runs the job over Hadoop and gives me results like a report at the end of job completion. My application is in JAVA
Any API which can help me. Suggestions will be appreciated.
回答1:
Look into the following methods of JobClient:
- getMapTaskReports(JobID)
- getReduceTaskReports(JobID)
Both these calls return arrays of TaskReport object, from which you can pull start / finish times, and individual counters for each task
回答2:
Chirs is correct. The documentation of TaskReport states that org.apache.hadoop.mapred.TaskReport
inherits those methods from org.apache.hadoop.mapreduce.TaskReport
. So, one could get such values.
Here are the codes to get the start and end time of a job, grouped for each Map and Reduce tasks.
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobStatus;
import org.apache.hadoop.conf.Configuration;
import java.net.InetSocketAddress;
import java.util.*;
import org.apache.hadoop.mapred.TaskReport;
import org.apache.hadoop.mapred.RunningJob;
import org.apache.hadoop.util.StringUtils;
import java.text.SimpleDateFormat;
public class mini{
public static void main(String args[]){
String jobTrackerHost = "192.168.151.14";
int jobTrackerPort = 54311;
try{
Configuration conf = new Configuration();
JobClient jobClient = new JobClient(new InetSocketAddress(jobTrackerHost, jobTrackerPort), conf);
JobStatus[] activeJobs = jobClient.jobsToComplete();
SimpleDateFormat dateFormat = new SimpleDateFormat("d-MMM-yyyy HH:mm:ss");
for(JobStatus js: activeJobs){
System.out.println(js.getJobID());
RunningJob runningjob = jobClient.getJob(js.getJobID());
while(runningjob.isComplete() == false){ /*Wait till the job completes.*/}
TaskReport[] maptaskreports = jobClient.getMapTaskReports(js.getJobID());
for(TaskReport tr: maptaskreports){
System.out.println("Task ID: "+tr.getTaskID()+" Start TIme: "+StringUtils.getFormattedTimeWithDiff(dateFormat, tr.getStartTime(), 0)+" Finish Time: "+StringUtils.getFormattedTimeWithDiff(dateFormat, tr.getFinishTime(), tr.getStartTime()));
}
TaskReport[] reducetaskreports = jobClient.getReduceTaskReports(js.getJobID());
for(TaskReport tr: reducetaskreports){
System.out.println("Task ID: "+tr.getTaskID()+" Start TIme: "+StringUtils.getFormattedTimeWithDiff(dateFormat, tr.getStartTime(), 0)+" Finish Time: "+StringUtils.getFormattedTimeWithDiff(dateFormat, tr.getFinishTime(), tr.getStartTime()));
}
}
}catch(Exception ex){
ex.printStackTrace();
}
}
}
This is a simple example to get the Start and Finish time of a running job. You can in the way you want.
And here is the run of this program for a "Word Count" MapReduce job.
[root@dev1-slave1 ~]# java -classpath /usr/lib/hadoop/hadoop-core.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-lang-2.4.jar:. mini
job_201501151144_0042
Task ID: task_201501151144_0042_m_000000 Start TIme: 16-Jan-2015 17:07:35 Finish Time: 16-Jan-2015 17:07:43 (7sec)
Task ID: task_201501151144_0042_m_000001 Start TIme: 16-Jan-2015 17:07:35 Finish Time: 16-Jan-2015 17:07:56 (20sec)
Task ID: task_201501151144_0042_m_000002 Start TIme: 16-Jan-2015 17:07:35 Finish Time: 16-Jan-2015 17:07:43 (7sec)
Task ID: task_201501151144_0042_m_000003 Start TIme: 16-Jan-2015 17:07:43 Finish Time: 16-Jan-2015 17:07:53 (10sec)
Task ID: task_201501151144_0042_m_000004 Start TIme: 16-Jan-2015 17:07:43 Finish Time: 16-Jan-2015 17:07:53 (10sec)
Task ID: task_201501151144_0042_r_000000 Start TIme: 16-Jan-2015 17:07:43 Finish Time: 16-Jan-2015 17:08:00 (17sec)
Task ID: task_201501151144_0042_r_000001 Start TIme: 16-Jan-2015 17:07:43 Finish Time: 16-Jan-2015 17:08:05 (22sec)
Task ID: task_201501151144_0042_r_000002 Start TIme: 16-Jan-2015 17:07:43 Finish Time: 16-Jan-2015 17:08:05 (21sec)
Its good to open the desired jsp
files of hadoop in its mapreduce/src/webapps/job/
directory and figure out how JOBTRACKER Web UI is displaying information.
I have derived above codes from jobtasks.jsp.
Hope it helps. :)
来源:https://stackoverflow.com/questions/16180654/how-to-get-completed-jobs-statistics-executed-by-hadoop