问题
I want to copy output of job from EMR cluster to Amazon S3 pro-grammatically.
How to use S3DistCp
in java code to do the same.
回答1:
hadoop ToolRunner
can run this.. since S3DistCP extends Tool
Below is the usage example:
import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; import org.apache.hadoop.util.ToolRunner; import com.amazon.external.elasticmapreduce.s3distcp.S3DistCp public class CustomS3DistCP{ private static final Log log = LogFactory.getLog(CustomS3DistCP.class); public static void main(String[] args) throws Exception { log.info("Running with args: " + args); System.exit(ToolRunner.run(new S3DistCp(), args)); }
you have to have s3distcp jar in your classpath You can call this program from a shell script.
Hope that helps!
来源:https://stackoverflow.com/questions/18124845/how-to-use-s3distcp-in-java-code