Similar unanswered question: Row by row processing of a Dask DataFrame
I\'m working with dataframes that are millions on rows long, and so now I\'m trying to have al
Dask dataframe does not support efficient iteration or row assignment. In general these workflows rarely scale well. They are also quite slow in Pandas itself.
Instead, you might consider using the Series.where method. Here is a minimal example:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'x': [1, 2, 3], 'y': [3, 2, 1]})
In [3]: import dask.dataframe as dd
In [4]: ddf = dd.from_pandas(df, npartitions=2)
In [5]: ddf['z'] = ddf.x.where(ddf.x > ddf.y, ddf.y)
In [6]: ddf.compute()
Out[6]:
x y z
0 1 3 3
1 2 2 2
2 3 1 3