I am trying to use the LXML module within AWS Lambda and having no luck. I downloaded LXML using the following command:
pip install lxml -t folder
Extending on these answers, I found the following to work well.
The punchline here is having python compile lxml with static libs, and installing in the current directory rather than site-packages.
It also means you can write your python code as usual, without need for a distinct worker.py or fiddling with LD_LIBRARY_PATH
sudo yum groupinstall 'Development Tools'
sudo yum -y install python36-devel python36-pip
sudo ln -s /usr/bin/pip-3.6 /usr/bin/pip3
mkdir lambda && cd lambda
STATIC_DEPS=true pip3 install -t . lxml
zip -r ~/deps.zip *
to take it to the next level, use serverless and docker to handle everything. here is a blog post demonstrating this: https://serverless.com/blog/serverless-python-packaging/
I have solved this using the serverless framework and its built-in Docker feature.
Requirement: You have an AWS profile in your .aws folder that can be accessed.
First, install the serverless framework as described here. You can then create a configuration file using the command serverless create --template aws-python3 --name my-lambda
. It will create a serverless.yml file and a handler.py with a simple "hello" function. You can check if that works with a sls deploy
. If that works, serverless is ready to be worked with.
Next, we'll need an additional plugin named "serverless-python-requirements" for bundling Python requirements. You can install it via sls plugin install --name serverless-python-requirements
.
This plugin is where all the magic happens that we need to solve the missing lxml package. In the custom->pythonRequirements section you simply have to add the dockerizePip: non-linux
property. Your serverless.yml file could look something like this:
service: producthunt-crawler
provider:
name: aws
runtime: python3.8
functions:
hello:
# some handler that imports lxml
handler: handler.hello
plugins:
- serverless-python-requirements
custom:
pythonRequirements:
fileName: requirements.txt
dockerizePip: non-linux
# Omits tests, __pycache__, *.pyc etc from dependencies
slim: true
This will run the bundling of python requirements inside a pre-configured docker container. After this, you can run sls deploy
to see the magic happen and then sls invoke -f my_function
to check that it works.
When you've used serverless to deploy and add the dockerizePip: non-linux
option later, make sure to clean up your already built requirements with sls requirements clean
. Otherwise, it just uses the already built stuff.
LXML is very sensitive with its running environment.
I fixed this issue by building the zip Lambda package in a python:3.x-slim container :
pip install --target=. lxml
zip -r lambda.zip lambda.py lxml
Image container version must be the same that the python engine version used in Lambda
Tested successfully with python 3.6, 3.7 and 3.8
AWS Lambda use a special version of Linux (as far as I can see).
Using "pip install a_package -t folder" is the good thing to do usually as it will help to package your dependencies within the archive that will be sent to Lambda, but the libraries, and especially the binary libraries have to be compatible with the version of OS and Python on lambda.
You could use the xml module included in Python : https://docs.python.org/2/library/xml.etree.elementtree.html
If you really need lxml, this link gives some tricks on how to compile shared libraries for Lambda : http://www.perrygeo.com/running-python-with-compiled-code-on-aws-lambda.html
I faced the same issue.
The link posted by Raphaël Braud was helpful and so was this one: https://nervous.io/python/aws/lambda/2016/02/17/scipy-pandas-lambda/
Using the two links I was able to successfully import lxml and other required packages. Here are the steps I followed:
Run the following script to accumulate dependencies:
set -e -o pipefail
sudo yum -y upgrade
sudo yum -y install gcc python-devel libxml2-devel libxslt-devel
virtualenv ~/env && cd ~/env && source bin/activate
pip install lxml
for dir in lib64/python2.7/site-packages \
lib/python2.7/site-packages
do
if [ -d $dir ] ; then
pushd $dir; zip -r ~/deps.zip .; popd
fi
done
mkdir -p local/lib
cp /usr/lib64/ #list of required .so files
local/lib/
zip -r ~/deps.zip local/lib
Create handler and worker files as specified in the link. Sample file contents:
handler.py
import os
import subprocess
libdir = os.path.join(os.getcwd(), 'local', 'lib')
def handler(event, context):
command = 'LD_LIBRARY_PATH={} python worker.py '.format(libdir)
output = subprocess.check_output(command, shell=True)
print output
return
worker.py:
import lxml
def sample_function( input_string = None):
return "lxml import successful!"
if __name__ == "__main__":
result = sample_function()
print result
Here is how the structure of the zip file looks after the above steps:
deps
├── handler.py
├── worker.py
├── local
│ └── lib
│ ├── libanl.so
│ ├── libBrokenLocale.so
| ....
├── lxml
│ ├── builder.py
│ ├── builder.pyc
| ....
├── <other python packages>
Hope this helps!
I was able to get this working by following the readme on this page:
python3.8
with the version of python you are using for your lambda function, and lxml
with the version of lxml you want to use)
$ docker run -v $(pwd):/outputs -it lambci/lambda:build-python3.8 \
pip install lxml -t /outputs/
lxml
in your working directory, and possibly some other folders which you can ignore. Move the lxml
folder to the same directory as the .py
file you are using as your lambda handler..py
file with the lxml folder, as well as any packages if you are using a virtualenv. I had a virtualenv and lxml already existed in my site-packages folder, so I had to delete it first. Here are the commands I ran (note that my virtualenv v-env folder was in the same directory as my .py
file):
FUNCTION_NAME="name_of_your_python_file"
cd v-env/lib/python3.8/site-packages &&
rm -rf lxml &&
rm -rf lxml-4.5.1.dist-info &&
zip -r9 ${OLDPWD}/${FUNCTION_NAME}.zip . &&
cd ${OLDPWD} &&
zip -g ${FUNCTION_NAME}.zip ${FUNCTION_NAME}.py &&
zip -r9 ${FUNCTION_NAME}.zip lxml
FUNCTION_NAME="name_of_your_python_file"
zip -g ${FUNCTION_NAME}.zip ${FUNCTION_NAME}.py &&
zip -r9 ${FUNCTION_NAME}.zip lxml
More on creating a .zip file for lambda with a virtualenv here