I\'m having an issue where my Hadoop job on AWS\'s EMR is not being saved to S3. When I run the job on a smaller sample, the job stores the output just fine. When I run th
This turned out to be a bug on AWS's part, and they've fixed it in the latest AMI version 2.2.1, briefly described in these release notes.
The long explanation I got from AWS is that when the reducer files are > the block limit for S3 (i.e. 5GB?), then multipart is used, but there was not proper error-checking going on, so that is why it would sometimes work, and other times not.
In case this continues for anyone else, refer to my case number, 62849531.