Select rows of pandas dataframe from list, in order of list

前端 未结 3 1673
一向
一向 2020-12-17 00:14

The question was originally asked here as a comment but could not get a proper answer as the question was marked as a duplicate.

For a given pandas.Da

相关标签:
3条回答
  • 2020-12-17 00:24

    One way to overcome this is to make the 'A' column an index and use loc on the newly generated pandas.DataFrame. Eventually, the subsampled dataframe's index can be reset.

    Here is how:

    ret = df.set_index('A').loc[list_of_values].reset_index(inplace=False)
    
    # ret is
    #      A   B
    # 0    3   3
    # 1    4   5
    # 2    6   2 
    

    Note that the drawback of this method is that the original indexing has been lost in the process.

    More on pandas indexing: What is the point of indexing in pandas?

    0 讨论(0)
  • 2020-12-17 00:28

    1] Generic approach for list_of_values.

    In [936]: dff = df[df.A.isin(list_of_values)]
    
    In [937]: dff.reindex(dff.A.map({x: i for i, x in enumerate(list_of_values)}).sort_values().index)
    Out[937]:
       A  B
    2  3  3
    3  4  5
    1  6  2
    

    2] If list_of_values is sorted. You can use

    In [926]: df[df.A.isin(list_of_values)].sort_values(by='A')
    Out[926]:
       A  B
    2  3  3
    3  4  5
    1  6  2
    
    0 讨论(0)
  • 2020-12-17 00:40

    Use merge with helper DataFrame created by list and with column name of matched column:

    df = pd.DataFrame({'A' : [5,6,3,4], 'B' : [1,2,3,5]})
    
    list_of_values = [3,6,4]
    df1 = pd.DataFrame({'A':list_of_values}).merge(df)
    print (df1)
       A  B
    0  3  3
    1  6  2
    2  4  5
    

    For more general solution:

    df = pd.DataFrame({'A' : [5,6,5,3,4,4,6,5], 'B':range(8)})
    print (df)
       A  B
    0  5  0
    1  6  1
    2  5  2
    3  3  3
    4  4  4
    5  4  5
    6  6  6
    7  5  7
    
    list_of_values = [6,4,3,7,7,4]
    

    #create df from list 
    list_df = pd.DataFrame({'A':list_of_values})
    print (list_df)
       A
    0  6
    1  4
    2  3
    3  7
    4  7
    5  4
    
    #column for original index values
    df1 = df.reset_index()
    #helper column for count duplicates values
    df1['g'] = df1.groupby('A').cumcount()
    list_df['g'] = list_df.groupby('A').cumcount()
    
    #merge together, create index from column and remove g column
    df = list_df.merge(df1).set_index('index').rename_axis(None).drop('g', axis=1)
    print (df)
       A  B
    1  6  1
    4  4  4
    3  3  3
    5  4  5
    
    0 讨论(0)
提交回复
热议问题