How can I select data from a dask dataframe by a list of indices?

后端 未结 2 1892
误落风尘
误落风尘 2021-02-09 06:28

I want to select rows from a dask dataframe based on a list of indices. How can I do that?

Example: Let\'s say, I have the following dask dataframe. <

2条回答
  •  被撕碎了的回忆
    2021-02-09 06:53

    Edit: dask now supports loc on lists:

    ddf_selected = ddf.loc[indices_i_want_to_select]
    

    The following should still work, but is not necessary anymore:

    import pandas as pd
    import dask.dataframe as dd
    
    #generate example dataframe
    pdf = pd.DataFrame(dict(A = [1,2,3,4,5], B = [6,7,8,9,0]), index=['i1', 'i2', 'i3', 4, 5])
    ddf = dd.from_pandas(pdf, npartitions = 2)
    
    #list of indices I want to select
    l = ['i1', 4, 5]
    
    #generate new dask dataframe containing only the specified indices
    ddf_selected = ddf.map_partitions(lambda x: x[x.index.isin(l)], meta = ddf.dtypes)
    

提交回复
热议问题