ENOSPC error on AWS Lambda

南笙酒味 提交于 2021-02-10 13:24:38

问题


Sorry for this loaded question.

I. TL;DR:

The /tmp directory on AWS Lambda keeps filling up when it shouldn't and gives me ENOSPC error on subsequent request.

II. The TL version:

I have a microservice built with Node JS (0.10x) on AWS Lambda that does 2 things:

  • Given a list of urls, it goes to relevant sources (S3, Cloudfront, thumbor, etc.) and download the physical files into the /tmp directory
  • After downloading all of these files, it will compress them into a tar ball and upload to S3.

Before posting the code, I just want to say that I'm using some non-standard dependencies, although it's hard to say what is standard in JavaScript

  • Bluebird for promises
  • Needle for HTTP requests
  • del for deleting files and folders

Here is the relevant code for each part

1. Download files

/**
 * Function downloads a single file from S3 then write to a temporary directory to prepare for later archiving.
 *
 * @param {String} url
 * @return {Promise}
 */
function downloadSingleFile(url) {
    let options = { open_timeout: 5000 };
    let promise;

    // go to CloudFront if image to get already-resized webp image
    if (isImage(url.relative)) {
        options.headers = { 'Content-Type' : 'image/webp' };
        promise = needle
            .getAsync(url.absolute, options)
            .then(resp => [path.join(conf.TEMP_DIR, url.relative), resp.body])
            .catch(e => {
                console.log("Socket hangup for: ", url.relative, url.absolute);
                return Promise.resolve();
            });
    } else {
        // else go directly to S3
        // take care to decode url back into raw string the way s3 expects it to be
        let s3key = decodeURIComponent(url.relative).replace(/^\//, '').replace(/\+/g, ' ');
        promise = s3.getObjectAsync({
            Bucket: conf.S3_STATIC_BUCKET,
            Key: s3key
        }).then(resp => [path.join(conf.TEMP_DIR, url.relative), resp.Body])
        .catch(e => {
            console.log("No such key on S3: ", s3key);
            return Promise.resolve();
        });
    }

    return promise.then(args => fs.writeFileAsync(args[0], args[1]).return(url));
}

/**
 * Function downloads a list of urls from S3 to the local tmp dir
 *
 * @param {Array} urls list of urls to download
 * @return {Promise}
 */
function downloadAllFiles(urls) {
    return Promise.map(urls, downloadSingleFile, {concurrency: 20});
}

2. Compress downloaded files

/**
 * Function compresses downloaded files from the local temp dir
 *
 * @return {Promise}
 */
function compressFiles() {
    return new Promise((resolve, reject) => {
        exec(`tar pczf /tmp/foo.tar.gz --remove-files ${conf.TEMP_DIR}`,
            function (err, stdout, stderr) {
                if (err) reject(err);
                else resolve();
        });
    });
}

III. The problem

When there are too many files to download, and thereby making the /tmp dir bigger than AWS Lambda 500MB limit, it gives me ENOSPC error, which is understandable. However, on subsequent request, even with a much smaller number of files to download, it still gives me ENOSPC error after a while before it goes back to normal.

IV. Possible cause

The only explanation, IMO, lies in AWS Lambda's cryptic doc

To improve performance, AWS Lambda may choose to retain an instance of your function and reuse it to serve a subsequent request, rather than creating a new copy. Your code should not assume that this will always happen.

Great, the problem is they don't tell you what to do when this does happen, which leads me to...

V. Things that I've tried

I basically force delete that directory before and after the function is invoked with

if (fs.existsSync(conf.TEMP_DIR))
    del.sync([conf.TEMP_DIR], {force: true});

but it still didn't seem to start with a clean slate, or rather, a clean /tmp in each invocation. I also thought of streaming the downloaded files' concent directly to a packaging library that supports streaming in memory as opposed to buffering to disk, e.g. adm-zip, but this library throws too many exception for me when using in conjunction with needle, which seems to be the best streamable HTTP library out there for Node. iirc, it was because of the Node version. Maybe if I upgrade my lambda runtime to 4.x, it will work again. But I want to make absolutely sure I understand the problem before investing into anymore hacks:

Why doesn't AWS Lambda function's non-persistent disk space in its own /tmp directory start fresh on each invocation, despite my best effort to clear it?

来源:https://stackoverflow.com/questions/37255550/enospc-error-on-aws-lambda

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!