Apache Livy cURL not working for spark-submit command

问题

I recently started working with Spark Scala, HDFS, sbt and Livy. Currently I tried to create livy batch.

Warning: Skip remote jar hdfs://localhost:9001/jar/project.jar.
java.lang.ClassNotFoundException: SimpleApp
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:225)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:686)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

This is the error statement, showing in livy batch log.

My spark-submit command is working perfectly fine for local .jar file.

spark-submit --class "SimpleApp" --master local target/scala-2.11/simple-project_2.11-1.0.jar

But same for livy (in cURL) it is throwing error.

"requirement failed: Local path /target/scala-2.11/simple-project_2.11-1.0.jar cannot be added to user sessions."

So, I shift .jar file in hdfs. My new code for livy is -

curl -X POST --data '{
    "file": "/jar/project.jar",
    "className": "SimpleApp",
    "args": ["ddd"]
}'  
-H 
"Content-Type: application/json" 
http://server:8998/batches

This is throwing error which is mention above.

Please let me know, where am I wrong?

Thanks in advance!

回答1:

hdfs://localhost:9001/jar/project.jar.

It's expecting your jar file located on hdfs.

If it's local, maybe you should try to specify protocol in a path, or just upload that into hdfs:

 "file": "file:///absolute_path/jar/project.jar",

回答2:

You have to make a fat jar file with your codebase + necessary jar - sbt assembly or use maven plugin, upload this jar file to HDFS and run spark-submit with this jar file which is placed on HDFS or you can use cURL as well.

Steps with Scala/Java:

Make fat jar with SBT/Maven or whatever.
Upload fat jar to HDFS
Use cURL for submitting jobs:

curl -X POST --data '{ //your data should be here}' -H "Content-Type: plication/json" your_ip:8998/batches

If you don't want to make a fat jar file and upload it to HDFS, you can consider python scripts, it could be submitted like a plain text without any jar file.

The example with plain python code:

curl your_ip:8998/sessions/0/statements -X POST -H 'Content-Type: application/json' -d '{"code":"print(\"asdf\")"}'

In data body, you have to send valid Python code. It's a way in which tools like Jupyter Notebook/Torch works.

Also, I made one more example with Livy and Python. For checking results:

curl your_ip:8998/sessions/0/statements/1

As I mentioned above, for Scala/Java fat jar and uploading to HDFS are required.

回答3:

To use local files for livy batch jobs you need to add the local folder to the livy.file.local-dir-whitelist property in livy.conf.

Description from livy.conf.template:

List of local directories from where files are allowed to be added to user sessions. By default it's empty, meaning users can only reference remote URIs when starting their sessions.

来源：https://stackoverflow.com/questions/50969333/apache-livy-curl-not-working-for-spark-submit-command

标签

scala

apache-spark

curl

HDFS

livy