问题
My question is related to Straggler problem. In sort, it's an algorithm and we can know its complexity and calculate the running time when executed on a constant set of data.
Why can't we acquire job execution time in Hadoop ?
If we can acquire the job execution time or task execution time, we can know the straggler tasks quickly without needing algorithms to know which task is Straggler.
回答1:
You should not estimate how much time a job will take before running that job. After running your mapreduce job, you can take an estimation of the time taken. Mapreduce always depends on your cluster capacity – RAM size, CPU Cores and network band width – and how many Reducers you set for the task.
You can only make assumptions based on your RAM size divided by the input split.
回答2:
The job execution time or the task execution time will be available in the job tracker web UI.Hope that is what you are looking for.the web UI will be availlable in 50030 port of your job tracker.If its a Yarn based setup the url would be http://:8088
来源:https://stackoverflow.com/questions/26876261/why-cant-we-calculate-job-execution-time-in-hadoop