How to know job flow id, other cluster parameters in script running via script-runner.jar

问题

I'm starting an elastic mapreduce cluster with the following command-line:

$ elastic-mapreduce \
--create \
--num-instances "${INSTANCES}" \
--instance-type m1.medium \
--ami-version 3.0.4 \
--name "${CLUSTER_NAME}" \
--log-uri "s3://my-bucket/elasticmapreduce/logs" \
--step-name "${STEP_NAME}" \
--step-action TERMINATE_JOB_FLOW \
--jar s3://elasticmapreduce/libs/script-runner/script-runner.jar \
--arg s3://my-bucket/log-parser/code/hadoop-script.sh \
--arg "${CLUSTER_NAME}" \
--arg "${STEP_NAME}" \
--arg s3n://my-bucket/log-parser/input \
--arg s3n://my-bucket/log-parser/output

I would like to be able to send an email from hadoop-script.sh that includes the log-files, but those are written to s3://my-bucket/elasticmapreduce/logs/{JOB_FLOW_ID}. Is there a way to know the JOB_FLOW_ID in my shell script?

Also: is there a way to know the jobflow name, step-name? (Currently I pass them as arguments, but it feels hacky)

回答1:

Instead of using a Shell script, you could use a Ruby script:

#!/usr/bin/ruby

require 'json'
require 'emr/common'

job_flow = Emr::JsonInfoFile.new('job-flow')
job_flow_id = job_flow['jobFlowId']

You can also get info on the job steps, for example:

step_one = Emr::JsonInfoFile.new('steps/1')
state = step_one['state']

or instance info:

instance_info = Emr::JsonInfoFile.new('instance')
is_master = instance_info['isMaster']

Basically, everything in the /mnt/var/lib/info/ directory is available through this interface.

来源：https://stackoverflow.com/questions/22934511/how-to-know-job-flow-id-other-cluster-parameters-in-script-running-via-script-r

标签

Hadoop

elastic-map-reduce