问题
I'm submitting spark jobs on a livy (0.6.0) session through Curl
The jobs are a big jar file that extends the Job interface just exactly like this : https://stackoverflow.com/a/49220879/8557851
Actually when running this code using this curl command :
curl -X POST -d '{"kind": "spark","files":["/config.json"],"jars":["/myjar.jar"],"driverMemory":"512M","executorMemory":"512M"}' -H "Content-Type: application/json" localhost:8998/sessions/
When it comes to the code it is exactly like the answer shown above :
package com.mycompany.test
import org.apache.livy.{Job, JobContext}
import org.apache.spark._
import org.apache.livy.scalaapi._
object Test extends Job[Boolean]{
override def call(jc: JobContext): Boolean = {
val sc = jc.sc
sc.getConf.getAll.foreach(println)
return true
}
As for the error it is a java Nullpointer exception as shown below
Exception in thread "main" java.lang.NullPointerException
at org.apache.livy.rsc.driver.JobWrapper.cancel(JobWrapper.java:90)
at org.apache.livy.rsc.driver.RSCDriver.shutdown(RSCDriver.java:127)
at org.apache.livy.rsc.driver.RSCDriver.run(RSCDriver.java:356)
at org.apache.livy.rsc.driver.RSCDriverBootstrapper.main(RSCDriverBootstrapper.java:93)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
as the output excepted is to start running the job in the jar
回答1:
I've used livy REST apis and with respect to that there are 2 approaches to submit spark job. Please refer rest api docs, you will get fair understanding livy rest requests.:
1. Batch (/batches) :
You submit request, you get job id. Based on job id you poll for status of spark job. Here you have option to execute uber jar as well as code file but I've never used latter
2. Session (/sessions and /sessions/{sessionId}/statements):
You submit spark job as code, no need to create uber jar. Here, you first create a Session and in this session you execute Statement/s (actual code)
For both the approaches, if you check documentation it has nice explanation about corresponding rest requests and request body/parameters.
Examples/Samples are here, here
Correction to your code, would be:
Batch
curl \
-X POST \
-d '{
"kind": "spark",
"files": [
"<use-absolute-path>"
],
"file": "absolute-path-to-your-application-jar",
"className": "fully-qualified-spark-class-name",
"driverMemory": "512M",
"executorMemory": "512M",
"conf": {<any-other-configs-as-key-val>}
}' \
-H "Content-Type: application/json" \
localhost:8998/batches/
Session and Statement
// Create a session
curl \
-X POST \
-d '{
"kind": "spark",
"files": [
"<use-absolute-path>"
],
"driverMemory": "512M",
"executorMemory": "512M",
"conf": {<any-other-configs-as-key-val>}
}' \
-H "Content-Type: application/json" \
localhost:8998/sessions/
// Run code/statement in session created above
curl \
-X POST \
-d '{
"kind": "spark",
"code": "spark-code"
}' \
-H "Content-Type: application/json" \
localhost:8998/sessions/{sessionId}/statements
来源:https://stackoverflow.com/questions/57754668/submitting-spark-jobs-over-livy-using-curl