Where is my AWS EMR reducer output for my completed job (should be on S3, but nothing there)?

后端 未结 1 1034
深忆病人
深忆病人 2021-01-15 19:01

I\'m having an issue where my Hadoop job on AWS\'s EMR is not being saved to S3. When I run the job on a smaller sample, the job stores the output just fine. When I run th

相关标签:
1条回答
  • 2021-01-15 19:33

    This turned out to be a bug on AWS's part, and they've fixed it in the latest AMI version 2.2.1, briefly described in these release notes.

    The long explanation I got from AWS is that when the reducer files are > the block limit for S3 (i.e. 5GB?), then multipart is used, but there was not proper error-checking going on, so that is why it would sometimes work, and other times not.

    In case this continues for anyone else, refer to my case number, 62849531.

    0 讨论(0)
提交回复
热议问题