Why can't we calculate job execution time in Hadoop?

问题

My question is related to Straggler problem. In sort, it's an algorithm and we can know its complexity and calculate the running time when executed on a constant set of data.

Why can't we acquire job execution time in Hadoop ?

If we can acquire the job execution time or task execution time, we can know the straggler tasks quickly without needing algorithms to know which task is Straggler.

回答1:

You should not estimate how much time a job will take before running that job. After running your mapreduce job, you can take an estimation of the time taken. Mapreduce always depends on your cluster capacity – RAM size, CPU Cores and network band width – and how many Reducers you set for the task.

You can only make assumptions based on your RAM size divided by the input split.

回答2:

The job execution time or the task execution time will be available in the job tracker web UI.Hope that is what you are looking for.the web UI will be availlable in 50030 port of your job tracker.If its a Yarn based setup the url would be http://:8088

来源：https://stackoverflow.com/questions/26876261/why-cant-we-calculate-job-execution-time-in-hadoop

标签

Hadoop

MapReduce

job-scheduling

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!