how can I preprocess input data before making predictions in sagemaker?

穿精又带淫゛_ 提交于 2019-12-12 11:50:24

问题


I am calling a Sagemaker endpoint using java Sagemaker SDK. The data that I am sending needs little cleaning before the model can use it for prediction. How can I do that in Sagemaker.

I have a pre-processing function in the Jupyter notebook instance which is cleaning the training data before passing that data to train the model. Now I want to know if I can use that function while calling the endpoint or is that function already being used? I can show my code if anyone wants?

EDIT 1 Basically, in the pre-processing, I am doing label encoding. Here is my function for preprocessing

def preprocess_data(data):
 print("entering preprocess fn")
 # convert document id & type to labels
 le1 = preprocessing.LabelEncoder()
 le1.fit(data["documentId"])
 data["documentId"]=le1.transform(data["documentId"])
 le2 = preprocessing.LabelEncoder()
 le2.fit(data["documentType"])
 data["documentType"]=le2.transform(data["documentType"])
 print("exiting preprocess fn")
 return data,le1,le2

Here the 'data' is a pandas dataframe.

Now I want to use these le1,le2 at the time of calling endpoint. I want to do this preprocessing in sagemaker itself not in my java code.


回答1:


There is now a new feature in SageMaker, called inference pipelines. This lets you build a linear sequence of two to five containers that pre/post-process requests. The whole pipeline is then deployed on a single endpoint.

https://docs.aws.amazon.com/sagemaker/latest/dg/inference-pipelines.html




回答2:


One option is to put your pre-processing code as part of an AWS Lambda function and use that Lambda to call the invoke-endpoint of SageMaker, once the pre-processing is done. AWS Lambda supports Python and it should be easy to have the same code that you have in your Jupyter notebook, also within that Lambda function. You can also use that Lambda to call external services such as DynamoDB for lookups for data enrichment.

You can find more information in the SageMaker documentations: https://docs.aws.amazon.com/sagemaker/latest/dg/getting-started-client-app.html




回答3:


You need to write a script and supply that while creating you model. That script would have a input_fn where you can do your preprocessing. Please refer to aws docs for more details.

https://docs.aws.amazon.com/sagemaker/latest/dg/mxnet-training-inference-code-template.html




回答4:


The SageMaker MXNet container is open source.

You add pandas do the docker container here: https://github.com/aws/sagemaker-mxnet-containers/blob/master/docker/1.1.0/Dockerfile.gpu#L4

The repo has instructions on how to build the container as well: https://github.com/aws/sagemaker-mxnet-containers#building-your-image

sagemaker container amazon-sagemaker



来源:https://stackoverflow.com/questions/49581156/how-can-i-preprocess-input-data-before-making-predictions-in-sagemaker

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!