Paths in AWS lambda with Python NLTK

后端 未结 3 626
时光说笑
时光说笑 2021-01-03 03:00

I\'m encountering problems with the NLTK package in AWS Lambda. However I believe the issue is related more to path configurations in Lambda being incorrect. NLTK is having

相关标签:
3条回答
  • 2021-01-03 03:18

    A bit late to this party, but if you look just above that snippet you pasted, the NLTK library (v.3.2.2) gives you the ability to add your own custom paths to the path array that is searched.

    # User-specified locations:
    _paths_from_env = os.environ.get('NLTK_DATA', str('')).split(os.pathsep)
    path += [d for d in _paths_from_env if d]
    

    So, now that Lambda allows you to add your own environment variables, you can set the NLTK_DATA environment variable to /var/task/nltk_data when you deploy your function and it should work. I haven't tested it on Lambda though.

    I'm not sure if Lambda allowed environment variables when you posted this question, but it should be doable now.

    EDIT 1 Revisiting this with some Python apps I'm deploying to Lambda, I used the solution provided by Matt above and it worked for me.

    nltk.data.path.append("/var/task/nltk_data")

    Prior to calling any functions requiring the NLTK corpora, you need to remember to

    import nltk

    Additionally, the corpora needs to be downloaded and installed in your project (per the above .append method) in the nltk_data subdirectory.

    If using a virtualenv within AWS Codebuild, the buildspec.yml snippet would look like:

    pre_build:
      commands:
        ...
        - export HOME_DIR=`pwd`
        - mkdir $HOME_DIR/nltk_data/
        - export NLTK_DATA=$HOME_DIR/nltk_data
        - $VIRTUAL_ENV/bin/python2.7 -m nltk.downloader -d $NLTK_DATA punkt
        ...
    
    0 讨论(0)
  • 2021-01-03 03:23

    Seems your current Python code runs from /var/task. I would suggest trying (haven't tried myself):

    nltk.data.path.append("/var/task/nltk_data")
    
    0 讨论(0)
  • 2021-01-03 03:28

    So I've found the answer to this question. After a couple of days messing around I've finally figured it out. The data.py file in the nltk folder needs to be modified as follows. Basically remove the /usr/... paths and add in the folder that Lambda executes from /var/task/ and ensure that your nltk_data folder is in the root of your execution zip.

    Not sure why, but using the inline nltk.data.path.append() method does not work with AWS Lambda and the data.py file needs to be modified directly.

    else:
        # Common locations on UNIX & OS X:
        path += [
            str('/var/task/nltk_data')
            #str('/usr/share/nltk_data'),
            #str('/usr/local/share/nltk_data'),
            #str('/usr/lib/nltk_data'),
            #str('/usr/local/lib/nltk_data')
        ]
    
    0 讨论(0)
提交回复
热议问题