问题
With a folder with many .feather
files, I would like to load all of them into dask in python.
So far, I have tried the following sourced from a similar question on GitHub https://github.com/dask/dask/issues/1277
files = [...]
dfs = [dask.delayed(feather.read_dataframe)(f) for f in files]
df = dd.concat(dfs)
Unfortunately, this gives me the error TypeError: Truth of Delayed objects is not supported
which is mentioned there, but a workaround is not clear.
Is it possible to do the above in dask?
回答1:
Instead of concat
, which operates on dataframes, you want to use from_delayed, which turns a list of delayed objects, each of which represents a dataframe, into a single logical dataframe
dfs = [dask.delayed(feather.read_dataframe)(f) for f in files]
df = dd.from_delayed(dfs)
If possible, you should also supply the meta=
(a zero-length dataframe, describing the columns, index and dtypes) and divisions=
(the boundary values of the index along the partitions) kwargs.
来源:https://stackoverflow.com/questions/57403908/load-many-feather-files-in-a-folder-into-dask