Solved : No space left on device in google Cloudml BASIC TIER. What is the disk size of each tier in cloudml?

怎甘沉沦 提交于 2020-01-16 16:30:21

问题


While training my model for data greater than 20GB in BASIC Tier in Cloud ML my jobs are failing because there is no disk space available in the Cloudml machines and I am not able to find any details in gcloud ml documentations [https://cloud.google.com/ml-engine/docs/tensorflow/machine-types].

Need help in deciding the TIER for my training jobs also the utilisation is very less in Job Details Graphs.

Expand all | Collapse all {
insertId:  "1klpt2"  
jsonPayload: {
created:  1554434546.3576794   
levelname:  "ERROR"   
lineno:  51   
message:  "Failed to train : [Errno 28] No space left on device"   
pathname:  "/root/.local/lib/python3.5/site- 
packages/loggerwrapper.py"   
}
labels: {
compute.googleapis.com/resource_id:  ""   
compute.googleapis.com/resource_name:  "cmle-training- 
10361805218452604847"   
compute.googleapis.com/zone:  ""   
ml.googleapis.com/job_id/log_area:  "root"   
ml.googleapis.com/trial_id:  ""   
}
logName:  "projects/backend/logs/master-replica-0"  
receiveTimestamp:  "2019-03-31T12:32:30.07683Z"  
resource: {
labels: {
job_id:  ""    
project_id:  "backend"    
task_name:  "master-replica-0"    
}
type:  "ml_job"   
}
severity:  "ERROR"  
timestamp:  "2019-03-31T12:32:26.357679367Z"   
}

回答1:


Solved : This error was coming not because of Storage Space instead coming because of shared memory tmfs. The sklearn fit was consuming all the shared memory while training. Solution : setting JOBLIB_TEMP_FOLDER environment variable , to /tmp solved the problem.



来源:https://stackoverflow.com/questions/55452871/solved-no-space-left-on-device-in-google-cloudml-basic-tier-what-is-the-disk

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!