spark-jobserver | 易学教程

Parallelism in Spark Job server

阅读更多关于 Parallelism in Spark Job server

问题 We are working on Qubole with Spark version 2.0.2. We have a multi-step process in which all the intermediate steps write their output to HDFS and later this output is used in the reporting layer. As per our use case, we want to avoid writing to HDFS and keep all the intermediate output as temporary tables in spark and directly write the final reporting layer output. For this implementation, we wanted to use Job server provided by Qubole but when we try to trigger multiple queries on the Job

Deploy Apache Spark application from another application in Java, best practice

阅读更多关于 Deploy Apache Spark application from another application in Java, best practice

问题 I am a new user of Spark. I have a web service that allows a user to request the server to perform a complex data analysis by reading from a database and pushing the results back to the database. I have moved those analysis's into various Spark applications. Currently I use spark-submit to deploy these applications. However, I am curious, when my web server (written in Java) receives a user request, what is considered the "best practice" way to initiate the corresponding Spark application?

Spark flattening out dataframes

阅读更多关于 Spark flattening out dataframes

问题 getting started with spark I would like to know how to flatmap or explode a dataframe. It was created using df.groupBy("columName").count and has the following structure if I collect it: [[Key1, count], [Key2, count2]] But I would rather like to have something like Map(bar -> 1, foo -> 1, awesome -> 1) What is the right tool to achieve something like this? Flatmap, explode or something else? Context: I want to use spark-jobserver. It only seems to provide meaningful results (e.g. a working

The error “Invalid job type for this context” in spark SQL job with Spark job server

阅读更多关于 The error “Invalid job type for this context” in spark SQL job with Spark job server

问题 I create a spark SQL job with spark job server and use HiveContext following the sample below: https://github.com/spark-jobserver/spark-jobserver/blob/master/job-server-extras/src/spark.jobserver/HiveTestJob.scala I was able to start the server but when I run my application(my Scala class which extends SparkSqlJob), I am getting the following as response: { "status": "ERROR", "result": "Invalid job type for this context" } Can any one suggest me what is going wrong or provide a detailed

DSE 4.6 to DSE 4.7 Failed to find Spark assembly

阅读更多关于 DSE 4.6 to DSE 4.7 Failed to find Spark assembly

问题 I have a problem with job-server-0.5.0 after upgraded DSE 4.6 to 4.7. If I run server_start.sh I'll get error "Failed to find Spark assembly in /usr/share/dse/spark/assembly/target/scala-2.10 You need to build Spark before running this program." I found in /usr/share/dse/spark/bin/compute-classpath.sh this code raises error for f in ${assembly_folder}/spark-assembly*hadoop*.jar; do if [[ ! -e "$f" ]]; then echo "Failed to find Spark assembly in $assembly_folder" 1>&2 echo "You need to build

Parallelism in Spark Job server

阅读更多关于 Parallelism in Spark Job server

We are working on Qubole with Spark version 2.0.2. We have a multi-step process in which all the intermediate steps write their output to HDFS and later this output is used in the reporting layer. As per our use case, we want to avoid writing to HDFS and keep all the intermediate output as temporary tables in spark and directly write the final reporting layer output. For this implementation, we wanted to use Job server provided by Qubole but when we try to trigger multiple queries on the Job server, Job server is running my jobs sequentially. I have observed the same behavior in Databricks

Apache spark rest API

阅读更多关于 Apache spark rest API

问题 I'm using the spark-submit command I have for the log4j properties to invoke a Spark-submit like this: /opt/spark-1.6.2-bin-hadoop2.6/bin/spark-submit \ --driver-java-options \ "-Dlog4j.configuration=file:/home/test_api/log4j-driver.properties\ --class Test testing.jar How do I do --driver-java-options , to submit a job via curl (Apache Spark's Hidden REST API)? I tried this: curl -X POST http://host-ip:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '

Apache spark rest API

阅读更多关于 Apache spark rest API

I'm using the spark-submit command I have for the log4j properties to invoke a Spark-submit like this: /opt/spark-1.6.2-bin-hadoop2.6/bin/spark-submit \ --driver-java-options \ "-Dlog4j.configuration=file:/home/test_api/log4j-driver.properties\ --class Test testing.jar How do I do --driver-java-options , to submit a job via curl (Apache Spark's Hidden REST API)? I tried this: curl -X POST http://host-ip:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{ "action" : "CreateSubmissionRequest", "appArgs" : [ "" ], "appResource" : "hdfs://host-ip:9000/test