move from pandas to dask to utilize all local cpu cores

做~自己de王妃 提交于 2020-01-12 08:50:39

问题


Recently I stumbled upon http://dask.pydata.org/en/latest/ As I have some pandas code which only runs on a single core I wonder how to make use of my other CPU cores. Would dask work well to use all (local) CPU cores? If yes how compatible is it to pandas?

Could I use multiple CPUs with pandas? So far I read about releasing the GIL but that all seems rather complicated.


回答1:


Would dask work well to use all (local) CPU cores?

Yes.

how compatible is it to pandas?

Pretty compatible. Not 100%. You can mix in Pandas and NumPy and even pure Python stuff with Dask if needed.

Could I use multiple CPUs with pandas?

You could. The easiest way would be to use multiprocessing and keep your data separate--have each job independently read from disk and write to disk if you can do so efficiently. A significantly harder way is using mpi4py which is most useful if you have a multi-computer environment with a professional administrator.




回答2:


Dask implements a large fraction of the pandas API in its dataframes. These operations call the very same pandas function on chunks of your overall dataframe, so you should expect them to be totally compatible.

The resulting computations can be run in any of the available schedulers allowing you to choose whether you are running low-overhead threads or something more complex. The distributed scheduler gives you full control over the split between threads and processes, has more features, and can be scaled out later across a cluster, so is becoming increasingly the preferred option, even for simple single-machine tasks.

Many pandas operations do release the GIL and so will work efficiently with threads. Also, many pandas operations can be easily broken down into parallel chunks - but some cannot and will either be slower (such as joins requiring shuffles), or not work at all (such as multi-indexing). The best way to find out is to give it a try!



来源:https://stackoverflow.com/questions/42649234/move-from-pandas-to-dask-to-utilize-all-local-cpu-cores

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!