File Not Found Error in Dask program run on cluster

与世无争的帅哥 提交于 2019-12-08 04:51:12

问题


I have 4 machines, M1, M2, M3, and M4. The scheduler, client, worker runs on M1. I've put a csv file in M1. Rest of the machines are workers.

When I run the program with read_csv file in dask. It gives me Error, file not found


回答1:


When one of your workers tries to load the CSV, it will not be able to find it, because it is not present on that local disc. This should not be a surprise. You can get around this in a number of ways:

  • copy the file to every worker; this is obviously wasteful in terms of disc space, but the easiest to achieve
  • place the file on a networked filesystem (NFS mount, gluster, HDFS, etc.)
  • place the file on an external storage system such as amazon S3 and refer to that location
  • load the data in your local process and distribute it with scatter; in this case presumably the data was small enough to fit in memory and probably dask would not be doing much for you.


来源:https://stackoverflow.com/questions/50987030/file-not-found-error-in-dask-program-run-on-cluster

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!