I\'m using AWS to run some data processing. I have 400 spot instances in EC2 with 4 processes each, all of them writing to a single bucket in S3. I\'ve started to get a (app
AWS documents 503 as a result of temporary error. It does not reflect a specific limit.
According to "Best Practices for Using Amazon S3" section on handling errors (http://aws.amazon.com/articles/1904/):
500-series errors indicate that a request didn't succeed, but may be retried. Though infrequent, these errors are to be expected as part of normal interaction with the service and should be explicitly handled with an exponential backoff algorithm (ideally one that utilizes jitter). One such algorithm can be found at http://en.wikipedia.org/wiki/Truncated_binary_exponential_backoff.
Particularly if you suddenly begin executing hundreds of PUTs per second into a single bucket, you may find that some requests return a 503 "Slow Down" error while the service works to repartition the load. As with all 500 series errors, these should be handled with exponential backoff.
While less detailed, the S3 Error responses documentation does include 503 Slow Down (http://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html).
To add to what James said, there are some internals about S3 partitioning that have been discussed and can be used to mitigate this in the future because exponential backoff is required.
See here: http://aws.typepad.com/aws/2012/03/amazon-s3-performance-tips-tricks-seattle-hiring-event.html
Briefly, don't store everything with the same prefix or there is a higher likelihood you will have these errors.Find some way to make the very first character in the prefix be as random as possible to avoid hotspots in S3's internal partitioning.
From what I've read, Slow Down is a very infrequent error. However, after posting this question I received an email from AWS that said the had capped my LIST requests to 10 requests per second because I had too many going to a specific bucket.
I had been using a custom queuing script for the project I am working on, which relied on LIST requests to determine the next item to process. After running into this problem I switched to AWS SQS, which was a lot simpler to implement than I'd thought it would be. No more custom queue, no more massive amount of LIST requests.
Thanks for the answers!