问题
Sorry for this loaded question.
I. TL;DR:
The /tmp
directory on AWS Lambda keeps filling up when it shouldn't and gives me ENOSPC error on subsequent request.
II. The TL version:
I have a microservice built with Node JS (0.10x) on AWS Lambda that does 2 things:
- Given a list of urls, it goes to relevant sources (S3, Cloudfront, thumbor, etc.) and download the physical files into the
/tmp
directory - After downloading all of these files, it will compress them into a tar ball and upload to S3.
Before posting the code, I just want to say that I'm using some non-standard dependencies, although it's hard to say what is standard in JavaScript
- Bluebird for promises
- Needle for HTTP requests
- del for deleting files and folders
Here is the relevant code for each part
1. Download files
/**
* Function downloads a single file from S3 then write to a temporary directory to prepare for later archiving.
*
* @param {String} url
* @return {Promise}
*/
function downloadSingleFile(url) {
let options = { open_timeout: 5000 };
let promise;
// go to CloudFront if image to get already-resized webp image
if (isImage(url.relative)) {
options.headers = { 'Content-Type' : 'image/webp' };
promise = needle
.getAsync(url.absolute, options)
.then(resp => [path.join(conf.TEMP_DIR, url.relative), resp.body])
.catch(e => {
console.log("Socket hangup for: ", url.relative, url.absolute);
return Promise.resolve();
});
} else {
// else go directly to S3
// take care to decode url back into raw string the way s3 expects it to be
let s3key = decodeURIComponent(url.relative).replace(/^\//, '').replace(/\+/g, ' ');
promise = s3.getObjectAsync({
Bucket: conf.S3_STATIC_BUCKET,
Key: s3key
}).then(resp => [path.join(conf.TEMP_DIR, url.relative), resp.Body])
.catch(e => {
console.log("No such key on S3: ", s3key);
return Promise.resolve();
});
}
return promise.then(args => fs.writeFileAsync(args[0], args[1]).return(url));
}
/**
* Function downloads a list of urls from S3 to the local tmp dir
*
* @param {Array} urls list of urls to download
* @return {Promise}
*/
function downloadAllFiles(urls) {
return Promise.map(urls, downloadSingleFile, {concurrency: 20});
}
2. Compress downloaded files
/**
* Function compresses downloaded files from the local temp dir
*
* @return {Promise}
*/
function compressFiles() {
return new Promise((resolve, reject) => {
exec(`tar pczf /tmp/foo.tar.gz --remove-files ${conf.TEMP_DIR}`,
function (err, stdout, stderr) {
if (err) reject(err);
else resolve();
});
});
}
III. The problem
When there are too many files to download, and thereby making the /tmp
dir bigger than AWS Lambda 500MB limit, it gives me ENOSPC error, which is understandable. However, on subsequent request, even with a much smaller number of files to download, it still gives me ENOSPC error after a while before it goes back to normal.
IV. Possible cause
The only explanation, IMO, lies in AWS Lambda's cryptic doc
To improve performance, AWS Lambda may choose to retain an instance of your function and reuse it to serve a subsequent request, rather than creating a new copy. Your code should not assume that this will always happen.
Great, the problem is they don't tell you what to do when this does happen, which leads me to...
V. Things that I've tried
I basically force delete that directory before and after the function is invoked with
if (fs.existsSync(conf.TEMP_DIR))
del.sync([conf.TEMP_DIR], {force: true});
but it still didn't seem to start with a clean slate, or rather, a clean /tmp
in each invocation. I also thought of streaming the downloaded files' concent directly to a packaging library that supports streaming in memory as opposed to buffering to disk, e.g. adm-zip, but this library throws too many exception for me when using in conjunction with needle, which seems to be the best streamable HTTP library out there for Node. iirc, it was because of the Node version. Maybe if I upgrade my lambda runtime to 4.x, it will work again. But I want to make absolutely sure I understand the problem before investing into anymore hacks:
Why doesn't AWS Lambda function's non-persistent disk space in its own /tmp directory start fresh on each invocation, despite my best effort to clear it?
来源:https://stackoverflow.com/questions/37255550/enospc-error-on-aws-lambda