I am building a python 3.6
AWS Lambda deploy package and was facing an issue with SQLite
.
In my code I am using nltk
which has a
This isn't a solution, but I have an explanation why.
Python 3 has support for sqlite in the standard library (stable to the point of pip knowing and not allowing installation of pysqlite). However, this library requires the sqlite developer tools (C libs) to be on the machine at runtime. Amazon's linux AMI does not have these installed by default, which is what AWS Lambda runs on (naked ami instances). I'm not sure if this means that sqlite support isn't installed or just won't work until the libraries are added, though, because I tested things in the wrong order.
Python 2 does not support sqlite in the standard library, you have to use a third party lib like pysqlite to get that support. This means that the binaries can be built more easily without depending on the machine state or path variables.
My suggestion, which you've already done I see, is to just run that function in python 2.7 if you can (and make your unit testing just that much harder :/).
Because of the limitations (it being something baked into python's base libs in 3) it is more difficult to create a lambda-friendly deployment package. The only thing I can suggest is to either petition AWS to add that support to lambda or (if you can get away without actually using the sqlite pieces in nltk) copying anaconda by putting blank libraries that have the proper methods and attributes but don't actually do anything.
If you're curious about the latter, check out any of the fake/_sqlite3
files in an anaconda install. The idea is only to avoid import errors.
From AusIV's answer, This version works for me in AWS Lambda and NLTK, I created a dummysqllite file to mock the required references.
spec = importlib.util.spec_from_file_location("_sqlite3","/dummysqllite.py")
sys.modules["_sqlite3"] = importlib.util.module_from_spec(spec)
sys.modules["sqlite3"] = importlib.util.module_from_spec(spec)
sys.modules["sqlite3.dbapi2"] = importlib.util.module_from_spec(spec)
Depending on what you're doing with NLTK, I may have found a solution.
The base nltk module imports a lot of dependencies, many of which are not used by substantial portions of its feature set. In my use case, I'm only using the nltk.sent_tokenize
, which carries no functional dependency on sqlite3 even though sqlite3 gets imported as a dependency.
I was able to get my code working on AWS Lambda by changing
import nltk
to
import imp
import sys
sys.modules["sqlite"] = imp.new_module("sqlite")
sys.modules["sqlite3.dbapi2"] = imp.new_module("sqlite.dbapi2")
import nltk
This dynamically creates empty modules for sqlite
and sqlite.dbapi2
. When nltk.corpus.reader.panlex_lite
tries to import sqlite
, it will get our empty module instead of the standard library version. That means the import will succeed, but it also means that when nltk tries to use the sqlite module it will fail.
If you're using any functionality that actually depends on sqlite, I'm afraid I can't help. But if you're trying to use other nltk functionality and just need to get around the lack of sqlite, this technique might work.
You need the sqlite3.so file (as others have pointed out), but the most robust way to get it is to pull from the (semi-official?) AWS Lambda docker images available in lambci/lambda. For example, for Python 3.7, here's an easy way to do this:
First, let's grab the sqlite3.so (library file) from the docker image:
mkdir lib
docker run -v $PWD:$PWD lambci/lambda:build-python3.7 bash -c "cp sqlite3.cpython*.so $PWD/lib/"
Next, we'll make a zipped executable with our requirements and code:
pip install -t output requirements.txt
pip install . -t output
zip -r output.zip output
Finally, we add the library file to our image:
cd lib && zip -r ../output.zip sqlite3.cpython*.so
If you want to use AWS SAM build/packaging, instead copy it into the top-level of the lambda environment package (i.e., next to your other python files).
As apathyman describes, there isn't a direct solution to this until Amazon bundle the C libraries required for sqlite3
into the AMI's used to run Python on lambda.
One workaround though, is using a pure Python implementation of SQLite, such as PyDbLite. This side-steps the problem, as a library like this doesn't require any particular C libraries to be installed, just Python.
Unfortunately, this doesn't help you if you are using a library which in turn uses the sqlite3
module.
This is a bit of a hack, but I've gotten this working by dropping the _sqlite3.so
file from Python 3.6 on CentOS 7 directly into the root of the project being deployed with Zappa to AWS. This should mean that if you can include _sqlite3.so
directly into the root of your ZIP, it should work, so it can be imported by this line in cpython
:
https://github.com/python/cpython/blob/3.6/Lib/sqlite3/dbapi2.py#L27
Not pretty, but it works. You can find a copy of _sqlite.so
here:
https://github.com/Miserlou/lambda-packages/files/1425358/_sqlite3.so.zip
Good luck!