问题
I am trying to submit a spark job to AWS EMR cluster using AWS console. But it fails with:
Cannot load main class from JAR
. The job runs successfully when I specify main class as --class
in Arguments
option in AWS EMR Console-> Add Step.
On the local machine, the job seems to work perfectly fine when no main class is specified as below:
./spark-submit /home/astro/spark-programs/SpotEMR/MyJob.jar
I have set main class to jar using run configuration. The main reason to avoid passing main class as --class
is, I have to run this job in AWS Datapipeline using EMRAcivity. In AWS Datapipeline, currently there is no way to specify a main class to a job being submitted.
Any help will be appreciated.
回答1:
Actually, you can pass the job's main class with EMRActivity and AWS Datapipeline.
See
https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-emractivity.html to launch a EMRActivity
using step
.
as well as
https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-submit-step.html to submit a spark job using an EMR step
with a main class.
The step would look as follows:
command-runner.jar,spark-submit,--class,org.apache.spark.examples.SparkPi
来源:https://stackoverflow.com/questions/48407769/aws-emr-spark-error-cannot-load-main-class-from-jar