s3-dist-cp and hadoop distcp job infinitely loopin in EMR
问题 I'm trying to copy 193 GB data from s3 to HDFS. I'm running the following commands for s3-dist-cp and hadoop distcp: s3-dist-cp --src s3a://PathToFile/file1 --dest hdfs:///user/hadoop/S3CopiedFiles/ hadoop distcp s3a://PathToFile/file1 hdfs:///user/hadoop/S3CopiedFiles/ I'm running these on the master node and also keeping a check on the amount being transferred. It took about an hour and after copying it over, everything gets erased, disk space is shown as 99.8% in the 4 core instances in my