Facing error while trying to create transient cluster on AWS emr to run Python script

谁说胖子不能爱 提交于 2020-08-10 19:17:38

问题


I am new to aws and trying to create a transient cluster on AWS emr to run a Python script. I just want to run the python script that will process the file and auto terminate the cluster post completion. I have also created a keypair and specified the same.

Command below :

aws emr create-cluster --name "test1-cluster" --release-label emr-5.5.0 --name pyspark_analysis --ec2-attributes KeyName=k-key-pair --applications Name=Hadoop Name=Hive Name=Spark --instance-groups --use-default-roles --instance-type m5-xlarge --instance-count 2 --region us-east-1 --log-uri s3://k-test-bucket-input/logs/ --steps Type=SPARK, Name="pyspark_analysis", ActionOnFailure=CONTINUE, Args=[-deploy-mode,cluster, -master,yarn, -conf,spark.yarn.submit.waitAppCompletion=true, -executor-memory,1g, s3://k-test-bucket-input/word_count.py, s3://k-test-bucket-input/input/a.csv, s3://k-test-bucket-input/output/ ] --auto-terminate

Error message

zsh: bad pattern: Args=[

What I tried :

I looked at the args and the spaces and if accidental characters are introduced or not but does not look like. Surely my syntax is wrong but not sure what I am missing.

What command is expected to do:

its expected to execute word_count.py by reading the input file a.csv and generating the output in b.csv


回答1:


I think the issue is with the use of spaces in --steps. I formatted the command, so its a bit easier to read where are the spaces (or luck of them):

aws emr create-cluster \
    --name "test1-cluster" \
    --release-label emr-5.5.0 \
    --name pyspark_analysis \
    --ec2-attributes KeyName=k-key-pair \
    --applications Name=Hadoop Name=Hive Name=Spark \
    --instance-groups --use-default-roles \
    --instance-type m5-xlarge --instance-count 2 \
    --region us-east-1 --log-uri s3://k-test-bucket-input/logs/ \
    --steps Type=SPARK,Name="pyspark_analysis",ActionOnFailure=CONTINUE,Args=[-deploy-mode,cluster,-master,yarn,-conf,spark.yarn.submit.waitAppCompletion=true,-executor-memory,1g,s3://k-test-bucket-input/word_count.py,s3://k-test-bucket-input/input/a.csv,s3://k-test-bucket-input/output/] \
    --auto-terminate



回答2:


Try enclosing everything in quotes

aws emr create-cluster \
    --name "test1-cluster" \
    --release-label emr-5.5.0 \
    --name pyspark_analysis \
    --ec2-attributes KeyName=k-key-pair \
    --applications Name=Hadoop Name=Hive Name=Spark \
    --instance-groups --use-default-roles \
    --instance-type m5-xlarge --instance-count 2 \
    --region us-east-1 --log-uri s3://k-test-bucket-input/logs/ \
    --steps Type="SPARK",Name="pyspark_analysis",ActionOnFailure="CONTINUE",Args=[-deploy-mode,cluster,-master,yarn,-conf,spark.yarn.submit.waitAppCompletion=true,-executor-memory,1g,s3://k-test-bucket-input/word_count.py,s3://k-test-bucket-input/input/a.csv,s3://k-test-bucket-input/output/] \
    --auto-terminate

Visit here for more info https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-submit-step.html

and yes spark can be used

aws emr create-cluster --name "Add Spark Step Cluster" --release-label emr-5.30.1 --applications Name=Spark \
--ec2-attributes KeyName=myKey --instance-type m5.xlarge --instance-count 3 \
--steps Type=Spark,Name="Spark Program",ActionOnFailure=CONTINUE,Args=[--class,org.apache.spark.examples.SparkPi,/usr/lib/spark/examples/jars/spark-examples.jar,10] --use-default-roles


来源:https://stackoverflow.com/questions/62928662/facing-error-while-trying-to-create-transient-cluster-on-aws-emr-to-run-python-s

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!