Why is my containerized Selenium application failing only in AWS Lambda?

问题

I'm trying to get a function to run in AWS Lambda that uses Selenium and Firefox/geckodriver in order to run. I've decided to go the route of creating a container image, and then uploading and running that instead of using a pre-configured runtime. I was able to create a Dockerfile that correctly installs Firefox and Python, downloads geckodriver, and installs my test code:

FROM alpine:latest

RUN apk add firefox python3 py3-pip
RUN pip install requests selenium

RUN mkdir /app
WORKDIR /app

RUN wget -qO gecko.tar.gz https://github.com/mozilla/geckodriver/releases/download/v0.28.0/geckodriver-v0.28.0-linux64.tar.gz
RUN tar xf gecko.tar.gz
RUN mv geckodriver /usr/bin

COPY *.py ./

ENTRYPOINT ["/usr/bin/python3","/app/lambda_function.py"]

The Selenium test code:

#!/usr/bin/env python3
import util
import os
import sys
import requests

def lambda_wrapper():
    api_base = f'http://{os.environ["AWS_LAMBDA_RUNTIME_API"]}/2018-06-01'
    response = requests.get(api_base + '/runtime/invocation/next')
    request_id = response.headers['Lambda-Runtime-Aws-Request-Id']
    try:
        result = selenium_test()
        
        # Send result back
        requests.post(api_base + f'/runtime/invocation/{request_id}/response', json={'url': result})
    except Exception as e:
        # Error reporting
        import traceback
        requests.post(api_base + f'/runtime/invocation/{request_id}/error', json={'errorMessage': str(e), 'traceback': traceback.format_exc(), 'logs': open('/tmp/gecko.log', 'r').read()})
        raise

def selenium_test():
    from selenium.webdriver import Firefox
    from selenium.webdriver.firefox.options import Options
    options = Options()
    options.add_argument('-headless')
    options.add_argument('--window-size 1920,1080')
    
    ffx = Firefox(options=options, log_path='/tmp/gecko.log')
    ffx.get("https://google.com")
    url = ffx.current_url
    ffx.close()
    print(url)

    return url
    

def main():
    # For testing purposes, currently not using the Lambda API even in AWS so that
    # the same container can run on my local machine.
    # Call lambda_wrapper() instead to get geckodriver logs as well (not informative).
    selenium_test()
    

if __name__ == '__main__':
    main()

I'm able to successfully build this container on my local machine with docker build -t lambda-test . and then run it with docker run -m 512M lambda-test.

However, the exact same container crashes with an error when I try and upload it to Lambda to run. I set the memory limit to 1024M and the timeout to 30 seconds. The traceback says that Firefox was unexpectedly killed by a signal:

START RequestId: 52adeab9-8ee7-4a10-a728-82087ec9de30 Version: $LATEST
/app/lambda_function.py:29: DeprecationWarning: use service_log_path instead of log_path
  ffx = Firefox(options=options, log_path='/tmp/gecko.log')
Traceback (most recent call last):
  File "/app/lambda_function.py", line 45, in <module>
    main()
  File "/app/lambda_function.py", line 41, in main
    lambda_wrapper()
  File "/app/lambda_function.py", line 12, in lambda_wrapper
    result = selenium_test()
  File "/app/lambda_function.py", line 29, in selenium_test
    ffx = Firefox(options=options, log_path='/tmp/gecko.log')
  File "/usr/lib/python3.8/site-packages/selenium/webdriver/firefox/webdriver.py", line 170, in __init__
    RemoteWebDriver.__init__(
  File "/usr/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__
    self.start_session(capabilities, browser_profile)
  File "/usr/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "/usr/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/usr/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: Process unexpectedly closed with status signal

END RequestId: 52adeab9-8ee7-4a10-a728-82087ec9de30
REPORT RequestId: 52adeab9-8ee7-4a10-a728-82087ec9de30  Duration: 20507.74 ms   Billed Duration: 21350 ms   Memory Size: 1024 MB    Max Memory Used: 131 MB Init Duration: 842.11 ms    
Unknown application error occurred

I had it upload the geckodriver logs as well, but there wasn't much useful information in there:

1608506540595   geckodriver INFO    Listening on 127.0.0.1:41597
1608506541569   mozrunner::runner   INFO    Running command: "/usr/bin/firefox" "--marionette" "-headless" "--window-size 1920,1080" "-foreground" "-no-remote" "-profile" "/tmp/rust_mozprofileQCapHy"
*** You are running in headless mode.

How can I even begin to debug this? The fact that the exact same container behaves differently depending upon where it's run seems fishy to me, but I'm not knowledgeable enough about Selenium, Docker, or Lambda to pinpoint exactly where the problem is.

Is my docker run command not accurately recreating the environment in Lambda? If so, then what command would I run to better simulate the Lambda environment? I'm not really sure where else to go from here, seeing as I can't actually reproduce the error locally to test with.

If anyone wants to take a look at the full code and try building it themselves, the repository is here - the lambda code is in lambda_function.py.

^{As for prior research, this question a) is about ChromeDriver and b) has no answers from over a year ago. The link from that one only has information about how to run a container in Lambda, which I'm already doing. This answer is almost my problem, but I know that there's not a version mismatch because the container works on my laptop just fine.}

来源：https://stackoverflow.com/questions/65385952/why-is-my-containerized-selenium-application-failing-only-in-aws-lambda

标签

Docker

selenium

aws-lambda

containers

geckodriver