问题
I am using pickle
to save my trained ML model. For the learning part, I am using scikit-learn
library and building a RandomForestClassifier
rf = RandomForestClassifier(n_estimators=100, max_depth=20,
min_samples_split=2, max_features='auto', oob_score=True,
random_state=123456)
rf.fit(X, y)
fp = open('model.pckl', 'wb')
pickle.dump(rf, fp, protocol=2)
fp.close()
I uploaded this model on S3 and I am fetching this model using boto3
library in AWS Lambda.
s3_client = boto3.client('s3')
bucket = 'mlbucket'
key = 'model.pckl'
download_path = '/tmp/{}{}'.format(uuid.uuid4(), key)
s3_client.download_file(bucket, key, download_path)
f = open(download_path, 'rb')
model = pickle.load(f)
f.close()
However, I am getting ValueError: non-string names in Numpy dtype unpickling
error at this line: model = pickle.load(f)
Here's the log:
START RequestId: 3d8a1263-1e3c-11e8-8bdb-03c0ef524c0e Version: $LATEST
non-string names in Numpy dtype unpickling: ValueError
Traceback (most recent call last):
File "/var/task/function.py", line 31, in handler
model = pickle.load(f)
File "/usr/lib64/python2.7/pickle.py", line 1384, in load
return Unpickler(file).load()
File "/usr/lib64/python2.7/pickle.py", line 864, in load
dispatch[key](self)
File "/usr/lib64/python2.7/pickle.py", line 1223, in load_build
setstate(state)
ValueError: non-string names in Numpy dtype unpickling
I am using python 2.7
on both local machine as well as AWS Lambda. The weird part is that the pickle.load()
is working fine on my local machine.
I have used this code to test pickle
on my local machine:
with open('/home/Documents/model.pckl', 'rb') as f:
rf = pickle.load(f)
回答1:
I found out that the problem was with the library version mismatch.
The libraries that I uploaded on AWS Lambda after zipping (numpy, scipy, etc.) were of the latest version, whereas the libraries on my local machine were older. As soon as I updated the libraries on my local machine, built the pickle objects and updated them on S3, lambda started working fine.
So, it turns out that the versions of not only python, but also the libraries do matter when pickling objects.
来源:https://stackoverflow.com/questions/49075045/valueerror-non-string-names-in-numpy-dtype-unpickling-only-on-aws-lambda