问题
I'm starting an elastic mapreduce cluster with the following command-line:
$ elastic-mapreduce \
--create \
--num-instances "${INSTANCES}" \
--instance-type m1.medium \
--ami-version 3.0.4 \
--name "${CLUSTER_NAME}" \
--log-uri "s3://my-bucket/elasticmapreduce/logs" \
--step-name "${STEP_NAME}" \
--step-action TERMINATE_JOB_FLOW \
--jar s3://elasticmapreduce/libs/script-runner/script-runner.jar \
--arg s3://my-bucket/log-parser/code/hadoop-script.sh \
--arg "${CLUSTER_NAME}" \
--arg "${STEP_NAME}" \
--arg s3n://my-bucket/log-parser/input \
--arg s3n://my-bucket/log-parser/output
I would like to be able to send an email from hadoop-script.sh that includes the log-files, but those are written to s3://my-bucket/elasticmapreduce/logs/{JOB_FLOW_ID}. Is there a way to know the JOB_FLOW_ID in my shell script?
Also: is there a way to know the jobflow name, step-name? (Currently I pass them as arguments, but it feels hacky)
回答1:
Instead of using a Shell script, you could use a Ruby script:
#!/usr/bin/ruby
require 'json'
require 'emr/common'
job_flow = Emr::JsonInfoFile.new('job-flow')
job_flow_id = job_flow['jobFlowId']
You can also get info on the job steps, for example:
step_one = Emr::JsonInfoFile.new('steps/1')
state = step_one['state']
or instance info:
instance_info = Emr::JsonInfoFile.new('instance')
is_master = instance_info['isMaster']
Basically, everything in the /mnt/var/lib/info/
directory is available through this interface.
来源:https://stackoverflow.com/questions/22934511/how-to-know-job-flow-id-other-cluster-parameters-in-script-running-via-script-r