问题
I recently started working with Spark Scala, HDFS, sbt and Livy. Currently I tried to create livy batch.
Warning: Skip remote jar hdfs://localhost:9001/jar/project.jar.
java.lang.ClassNotFoundException: SimpleApp
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:225)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:686)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
This is the error statement, showing in livy batch log.
My spark-submit command is working perfectly fine for local .jar file.
spark-submit --class "SimpleApp" --master local target/scala-2.11/simple-project_2.11-1.0.jar
But same for livy (in cURL) it is throwing error.
"requirement failed: Local path /target/scala-2.11/simple-project_2.11-1.0.jar cannot be added to user sessions."
So, I shift .jar file in hdfs. My new code for livy is -
curl -X POST --data '{
"file": "/jar/project.jar",
"className": "SimpleApp",
"args": ["ddd"]
}'
-H
"Content-Type: application/json"
http://server:8998/batches
This is throwing error which is mention above.
Please let me know, where am I wrong?
Thanks in advance!
回答1:
hdfs://localhost:9001/jar/project.jar.
It's expecting your jar file located on hdfs.
If it's local, maybe you should try to specify protocol in a path, or just upload that into hdfs:
"file": "file:///absolute_path/jar/project.jar",
回答2:
You have to make a fat jar
file with your codebase + necessary jar - sbt assembly
or use maven plugin, upload this jar file to HDFS
and run spark-submit
with this jar file which is placed on HDFS or you can use cURL
as well.
Steps with Scala/Java
:
- Make fat jar with SBT/Maven or whatever.
- Upload fat jar to
HDFS
- Use
cURL
for submitting jobs:
curl -X POST --data '{ //your data should be here}' -H "Content-Type: plication/json" your_ip:8998/batches
If you don't want to make a fat jar file and upload it to HDFS, you can consider python scripts, it could be submitted like a plain text without any jar file.
The example with plain python code:
curl your_ip:8998/sessions/0/statements -X POST -H 'Content-Type: application/json' -d '{"code":"print(\"asdf\")"}'
In data body, you have to send valid Python code.
It's a way in which tools like Jupyter Notebook/Torch
works.
Also, I made one more example with Livy and Python. For checking results:
curl your_ip:8998/sessions/0/statements/1
As I mentioned above, for Scala/Java fat jar and uploading to HDFS are required.
回答3:
To use local files for livy
batch jobs you need to add the local folder to the livy.file.local-dir-whitelist
property in livy.conf
.
Description from livy.conf.template
:
List of local directories from where files are allowed to be added to user sessions. By default it's empty, meaning users can only reference remote URIs when starting their sessions.
来源:https://stackoverflow.com/questions/50969333/apache-livy-curl-not-working-for-spark-submit-command