How to load local resource from a python package loaded in AWS PySpark

余生颓废 提交于 2020-05-28 11:59:10

问题


I have uploaded a python package into AWS EMR with PySpark. My python package has a structure like the following, where I have a resource file (a sklearn joblib model) within the package:

myetllib
    ├── Dockerfile
    ├── __init__.py
    ├── modules
    │   ├── bin
    │   ├── joblib
    │   ├── joblib-0.14.1.dist-info
    │   ├── numpy
    │   ├── numpy-1.18.4.dist-info
    │   ├── numpy.libs
    │   ├── scikit_learn-0.21.3.dist-info
    │   ├── scipy
    │   ├── scipy-1.4.1.dist-info
    │   └── sklearn
    ├── requirements.txt
    └── mysubmodule
        ├── __init__.py
        ├── model.py
        └── models/mymodel.joblib

I then zip the package and upload to EMR. Now I can import model.py in the console, doing like

from myetllib.mysubmodule.model import load_model, run_model

but when I call load_model I get an error that joblib complains not finding the package resource file, that is models/mymodel.joblib The path is correctly set like

import joblib

BASE_PATH = os.path.join(os.path.dirname(os.path.realpath(__file__)))
MODEL_PATH =  os.path.join(BASE_PATH,"models/my_model.joblib")

def load_model():
    '''
        load scikit-learn model via joblib
    '''
    with warnings.catch_warnings():
        warnings.filterwarnings('ignore', category=UserWarning)
        return joblib.load(MODEL_PATH)

and the error is like

NotADirectoryError: [Errno 20] Not a directory: '/mnt/tmp/spark-fc45e56b-06f3-56dd-af44-0ecc93d4gc0d/userFiles-1e3455-a6rf-4adc-592b-bbe41ffa323/etllib-v1.0.0.zip/etllib/mysubmodule/models/my_model.joblib

Also, I'm getting another error from sklearn:

NotADirectoryError: [Errno 20] Not a directory: '/mnt/tmp/spark-904b50d2-0407-43e8-bb46-06a7b334a46b/userFiles-5df387de-066e-498a-8dd3-e8329d0e8252/etllib-v1.0.1.zip/etllib/modules/sklearn/__check_build

来源:https://stackoverflow.com/questions/61619318/how-to-load-local-resource-from-a-python-package-loaded-in-aws-pyspark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!