spark-jobserver

Parallelism in Spark Job server

谁说我不能喝 提交于 2019-12-23 02:49:04
问题 We are working on Qubole with Spark version 2.0.2. We have a multi-step process in which all the intermediate steps write their output to HDFS and later this output is used in the reporting layer. As per our use case, we want to avoid writing to HDFS and keep all the intermediate output as temporary tables in spark and directly write the final reporting layer output. For this implementation, we wanted to use Job server provided by Qubole but when we try to trigger multiple queries on the Job

Deploy Apache Spark application from another application in Java, best practice

北城以北 提交于 2019-12-21 02:20:52
问题 I am a new user of Spark. I have a web service that allows a user to request the server to perform a complex data analysis by reading from a database and pushing the results back to the database. I have moved those analysis's into various Spark applications. Currently I use spark-submit to deploy these applications. However, I am curious, when my web server (written in Java) receives a user request, what is considered the "best practice" way to initiate the corresponding Spark application?

Spark flattening out dataframes

て烟熏妆下的殇ゞ 提交于 2019-12-11 05:20:59
问题 getting started with spark I would like to know how to flatmap or explode a dataframe. It was created using df.groupBy("columName").count and has the following structure if I collect it: [[Key1, count], [Key2, count2]] But I would rather like to have something like Map(bar -> 1, foo -> 1, awesome -> 1) What is the right tool to achieve something like this? Flatmap, explode or something else? Context: I want to use spark-jobserver. It only seems to provide meaningful results (e.g. a working

The error “Invalid job type for this context” in spark SQL job with Spark job server

自古美人都是妖i 提交于 2019-12-10 17:36:19
问题 I create a spark SQL job with spark job server and use HiveContext following the sample below: https://github.com/spark-jobserver/spark-jobserver/blob/master/job-server-extras/src/spark.jobserver/HiveTestJob.scala I was able to start the server but when I run my application(my Scala class which extends SparkSqlJob), I am getting the following as response: { "status": "ERROR", "result": "Invalid job type for this context" } Can any one suggest me what is going wrong or provide a detailed

DSE 4.6 to DSE 4.7 Failed to find Spark assembly

自闭症网瘾萝莉.ら 提交于 2019-12-08 10:08:42
问题 I have a problem with job-server-0.5.0 after upgraded DSE 4.6 to 4.7. If I run server_start.sh I'll get error "Failed to find Spark assembly in /usr/share/dse/spark/assembly/target/scala-2.10 You need to build Spark before running this program." I found in /usr/share/dse/spark/bin/compute-classpath.sh this code raises error for f in ${assembly_folder}/spark-assembly*hadoop*.jar; do if [[ ! -e "$f" ]]; then echo "Failed to find Spark assembly in $assembly_folder" 1>&2 echo "You need to build

Parallelism in Spark Job server

邮差的信 提交于 2019-12-08 04:52:32
We are working on Qubole with Spark version 2.0.2. We have a multi-step process in which all the intermediate steps write their output to HDFS and later this output is used in the reporting layer. As per our use case, we want to avoid writing to HDFS and keep all the intermediate output as temporary tables in spark and directly write the final reporting layer output. For this implementation, we wanted to use Job server provided by Qubole but when we try to trigger multiple queries on the Job server, Job server is running my jobs sequentially. I have observed the same behavior in Databricks

Apache spark rest API

半世苍凉 提交于 2019-12-05 00:50:08
问题 I'm using the spark-submit command I have for the log4j properties to invoke a Spark-submit like this: /opt/spark-1.6.2-bin-hadoop2.6/bin/spark-submit \ --driver-java-options \ "-Dlog4j.configuration=file:/home/test_api/log4j-driver.properties\ --class Test testing.jar How do I do --driver-java-options , to submit a job via curl (Apache Spark's Hidden REST API)? I tried this: curl -X POST http://host-ip:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '

Apache spark rest API

被刻印的时光 ゝ 提交于 2019-12-03 17:04:12
I'm using the spark-submit command I have for the log4j properties to invoke a Spark-submit like this: /opt/spark-1.6.2-bin-hadoop2.6/bin/spark-submit \ --driver-java-options \ "-Dlog4j.configuration=file:/home/test_api/log4j-driver.properties\ --class Test testing.jar How do I do --driver-java-options , to submit a job via curl (Apache Spark's Hidden REST API)? I tried this: curl -X POST http://host-ip:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{ "action" : "CreateSubmissionRequest", "appArgs" : [ "" ], "appResource" : "hdfs://host-ip:9000/test