How can I select data from a dask dataframe by a list of indices?

后端 未结 2 1893
误落风尘
误落风尘 2021-02-09 06:28

I want to select rows from a dask dataframe based on a list of indices. How can I do that?

Example: Let\'s say, I have the following dask dataframe. <

相关标签:
2条回答
  • 2021-02-09 06:53

    Edit: dask now supports loc on lists:

    ddf_selected = ddf.loc[indices_i_want_to_select]
    

    The following should still work, but is not necessary anymore:

    import pandas as pd
    import dask.dataframe as dd
    
    #generate example dataframe
    pdf = pd.DataFrame(dict(A = [1,2,3,4,5], B = [6,7,8,9,0]), index=['i1', 'i2', 'i3', 4, 5])
    ddf = dd.from_pandas(pdf, npartitions = 2)
    
    #list of indices I want to select
    l = ['i1', 4, 5]
    
    #generate new dask dataframe containing only the specified indices
    ddf_selected = ddf.map_partitions(lambda x: x[x.index.isin(l)], meta = ddf.dtypes)
    
    0 讨论(0)
  • 2021-02-09 06:53

    Using dask version '1.2.0' results with an error due to the mixed index type. in any case there is an option to use loc.

    import pandas as pd
    import dask.dataframe as dd
    
    #generate example dataframe
    pdf = pd.DataFrame(dict(A = [1,2,3,4,5], B = [6,7,8,9,0]), index=['i1', 'i2', 'i3', '4', '5'])
    ddf = dd.from_pandas(pdf, npartitions = 2,)
    
    # #list of indices I want to select
    l = ['i1', '4', '5']
    
    # #generate new dask dataframe containing only the specified indices
    # ddf_selected = ddf.map_partitions(lambda x: x[x.index.isin(l)], meta = ddf.dtypes)
    ddf_selected = ddf.loc[l]
    ddf_selected.head()
    
    0 讨论(0)
提交回复
热议问题