Parallelism in Spark Job server

谁说我不能喝 提交于 2019-12-23 02:49:04

问题


We are working on Qubole with Spark version 2.0.2.

We have a multi-step process in which all the intermediate steps write their output to HDFS and later this output is used in the reporting layer.

As per our use case, we want to avoid writing to HDFS and keep all the intermediate output as temporary tables in spark and directly write the final reporting layer output.

For this implementation, we wanted to use Job server provided by Qubole but when we try to trigger multiple queries on the Job server, Job server is running my jobs sequentially.

I have observed the same behavior in Databricks cluster as well.

The cluster we are using is a 30 node, r4.2xlarge.

Does anyone has experience in running multiple jobs using job server ?

Community's help will be greatly appreciated !

来源:https://stackoverflow.com/questions/43738862/parallelism-in-spark-job-server

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!