I am trying to have 2 steps run concurrent in EMR. However I always get the first step running and the second pending.
Part of my Yarn configuration is as follows:<
It looks that AWS finally implemented this feature in EMR 5.28.0!
The parameter is called "Concurrency" in the console wizard or StepConcurrencyLevel in the API:
Specifies the number of steps that can be executed concurrently. The default value is 1. The maximum value is 256.
AWS now allows you to run steps concurrently in the later versions of EMR. https://aws.amazon.com/about-aws/whats-new/2019/11/amazon-emr-now-allows-you-to-run-multiple-steps-in-parallel-cancel-running-steps-and-integrate-with-aws-step-functions/
One thing to note while doing this is to take care of resources, as your applications would be fighting for the available resource and one of them might end up in an accepted state not starting until the other one finishes, defeating the purpose.
There are 2 modes of running application in AWS EMR Yarn:
If you use client mode then only one step will be in running state at a given time. However there is an option where in you can run more then 1 step concurrently.
try submitting your step in blow mode: spark-submit --master yarn --deploy-mode cluster --executor-memory 1G --num-executors 2 --driver-memory 1g --executor-cores 2 --conf spark.yarn.submit.waitAppCompletion=false --class WordCount.word.App /home/hadoop/word.jar
Hope this may of help for you.
Is it possible to have the step run concurrently or only serially?
Is there any tips or something specific to run to job concurrently?
spark-history server
On your local mac, you are able to run multiple YARN application in parallel because you are submitting the applications to yarn directly, whereas in EMR the yarn/spark applications are submitted through AWS's internal `command-runner.jar`, it does a bunch of other logging/bootstrapping etc to be able to see the `emr step` info on the web console.
you could always put the step in the background. shouldn't be a problem if you handle logging and race conditions.
step-job.sh
#!/bin/bash
function main(){
do_this
do_that
}
if [[ "$1" == "1" ]]; then
main
else
/bin/bash "$0" 1 $@ &
fi