Load many feather files in a folder into dask

只愿长相守 提交于 2021-01-27 18:35:27

问题


With a folder with many .feather files, I would like to load all of them into dask in python.

So far, I have tried the following sourced from a similar question on GitHub https://github.com/dask/dask/issues/1277

files = [...]
dfs = [dask.delayed(feather.read_dataframe)(f) for f in files]
df = dd.concat(dfs)

Unfortunately, this gives me the error TypeError: Truth of Delayed objects is not supported which is mentioned there, but a workaround is not clear.

Is it possible to do the above in dask?


回答1:


Instead of concat, which operates on dataframes, you want to use from_delayed, which turns a list of delayed objects, each of which represents a dataframe, into a single logical dataframe

dfs = [dask.delayed(feather.read_dataframe)(f) for f in files]
df = dd.from_delayed(dfs)

If possible, you should also supply the meta= (a zero-length dataframe, describing the columns, index and dtypes) and divisions= (the boundary values of the index along the partitions) kwargs.



来源:https://stackoverflow.com/questions/57403908/load-many-feather-files-in-a-folder-into-dask

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!