Google Cloud ML Engine Error 429 Out of Memory

杀马特。学长 韩版系。学妹 提交于 2019-12-24 01:04:52

问题


I uploaded my model to ML-engine and when trying to make a prediction I receive the following error:

ERROR: (gcloud.ml-engine.predict) HTTP request failed. Response: {   "error": {
    "code": 429,
    "message": "Prediction server is out of memory, possibly because model size is too big.",
    "status": "RESOURCE_EXHAUSTED"   } }

My model size is 151.1 MB. I already did all the suggested actions from google cloud website such as quantise. Is there a possible solution or any other thing I could do to make it work?

Thanks


回答1:


Typically a model of this size should not result in OOM. Since TF does a lot of lazy initialization, some OOMs won't be detected until the first request to initialize the data structure. In rare case certain graph can explode 10x in memory causing OOM.

1) Did you see the prediction error consistently? Due to the way Tensorflow schedules nodes the memory usage for the same graph might be different across runs. Make sure to run prediction multiple times and see if it's 429 every time.

2) Please make sure 151.1MB is the size of your SavedModel Directory.

3) You can also debug the peak memory locally, for instance using top when running gcloud ml-engine local predict or by loading the model into memory in a docker container and use docker stats or some other way to monitor memory usage. You can try tensorflow serving for debugging (https://www.tensorflow.org/serving/serving_basic) and post the results.

4) If you find the memory problem is persistent, please contact cloudml-feedback@google.com for further assistance, make sure you include your project number and associated account for further debugging.



来源:https://stackoverflow.com/questions/49304175/google-cloud-ml-engine-error-429-out-of-memory

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!