问题
I have created a model endpoint which is InService and deployed on an ml.m4.xlarge instance. I am also using API Gateway to create a RESTful API.
Questions:
Is it possible to have my model endpoint only Inservice (or on standby) when I receive inference requests? Maybe by writing a lambda function or something that turns off the endpoint (so that it does not keep accumulating the per hour charges)
If q1 is possible, would this have some weird latency issues on the end users? Because it usually takes a couple of minutes for model endpoints to be created when I configure them for the first time.
If q1 is not possible, how would choosing a cheaper instance type affect the time it takes to perform inference (Say I'm only using the endpoints for an application that has a low number of users).
I am aware of this site that compares different instance types (https://aws.amazon.com/sagemaker/pricing/instance-types/)
But, does having a moderate network performance mean that the time to perform realtime inference may be longer?
Any recommendations are much appreciated. The goal is not to burn money when users are not requesting for predictions.
回答1:
How large is your model? If it is under the 50 MB size limit required by AWS Lambda and the dependencies are small enough, there could be a way to rely directly on Lambda as an execution engine.
If your model is larger than 50 MB, there might still be a way to run it by storing it on EFS. See EFS for Lambda.
来源:https://stackoverflow.com/questions/62765780/is-there-a-way-to-turn-on-sagemaker-model-endpoints-only-when-i-am-receiving-inf