hadoop copying from hdfs to S3

后端未结

关注

 1  1063

I\'ve successfully completed mahout vectorizing job on Amazon EMR (using Mahout on Elastic MapReduce as reference). Now I want to copy results from HDFS to S3 (to use it in

相关标签:

1条回答

执笔经年

2020-12-20 06:24

I've found a bug:

The main problem is not

java.net.UnknownHostException: unknown host: my.bucket

but:

2012-09-06 13:27:33,909 FATAL com.amazon.external.elasticmapreduce.s3distcp.S3DistCp (main): Failed to get source file system

So. After adding 1 more slash in source path - job was started without problems. Correct command is:

elastic-mapreduce --jobflow $JOBID \
> --jar --arg s3://eu-west-1.elasticmapreduce/libs/s3distcp/1.latest/s3distcp.jar \
> --arg --s3Endpoint --arg 's3-eu-west-1.amazonaws.com' \
> --arg --src --arg 'hdfs:///my.bucket/prj1/seqfiles' \
> --arg --dest --arg 's3://my.bucket/prj1/seqfiles'

P.S. So. it is working. Job is correctly finished. I've successfully copied dir with 30Gb file.

0 讨论(0)