Paths in AWS lambda with Python NLTK

后端 未结 3 627
时光说笑
时光说笑 2021-01-03 03:00

I\'m encountering problems with the NLTK package in AWS Lambda. However I believe the issue is related more to path configurations in Lambda being incorrect. NLTK is having

3条回答
  •  囚心锁ツ
    2021-01-03 03:18

    A bit late to this party, but if you look just above that snippet you pasted, the NLTK library (v.3.2.2) gives you the ability to add your own custom paths to the path array that is searched.

    # User-specified locations:
    _paths_from_env = os.environ.get('NLTK_DATA', str('')).split(os.pathsep)
    path += [d for d in _paths_from_env if d]
    

    So, now that Lambda allows you to add your own environment variables, you can set the NLTK_DATA environment variable to /var/task/nltk_data when you deploy your function and it should work. I haven't tested it on Lambda though.

    I'm not sure if Lambda allowed environment variables when you posted this question, but it should be doable now.

    EDIT 1 Revisiting this with some Python apps I'm deploying to Lambda, I used the solution provided by Matt above and it worked for me.

    nltk.data.path.append("/var/task/nltk_data")

    Prior to calling any functions requiring the NLTK corpora, you need to remember to

    import nltk

    Additionally, the corpora needs to be downloaded and installed in your project (per the above .append method) in the nltk_data subdirectory.

    If using a virtualenv within AWS Codebuild, the buildspec.yml snippet would look like:

    pre_build:
      commands:
        ...
        - export HOME_DIR=`pwd`
        - mkdir $HOME_DIR/nltk_data/
        - export NLTK_DATA=$HOME_DIR/nltk_data
        - $VIRTUAL_ENV/bin/python2.7 -m nltk.downloader -d $NLTK_DATA punkt
        ...
    

提交回复
热议问题