How to do row processing and item assignment in Dask

前端 未结 1 837
无人及你
无人及你 2021-01-15 16:17

Similar unanswered question: Row by row processing of a Dask DataFrame

I\'m working with dataframes that are millions on rows long, and so now I\'m trying to have al

相关标签:
1条回答
  • 2021-01-15 16:45

    Dask dataframe does not support efficient iteration or row assignment. In general these workflows rarely scale well. They are also quite slow in Pandas itself.

    Instead, you might consider using the Series.where method. Here is a minimal example:

    In [1]: import pandas as pd
    
    In [2]: df = pd.DataFrame({'x': [1, 2, 3], 'y': [3, 2, 1]})
    
    In [3]: import dask.dataframe as dd
    
    In [4]: ddf = dd.from_pandas(df, npartitions=2)
    
    In [5]: ddf['z'] = ddf.x.where(ddf.x > ddf.y, ddf.y)
    
    In [6]: ddf.compute()
    Out[6]:
       x  y  z
    0  1  3  3
    1  2  2  2
    2  3  1  3
    
    0 讨论(0)
提交回复
热议问题