return n smallest indexes by column using pandas

后端 未结 3 1648
被撕碎了的回忆
被撕碎了的回忆 2020-12-31 07:26

I have the following (simplified) dataframe:

df = pd.DataFrame({\'X\': [1, 2, 3, 4, 5,6,7,8,9,10],
\'Y\': [10,20,30,40,50,-10,-20,-30,-40,-50],
\'Z\': [20,18         


        
相关标签:
3条回答
  • 2020-12-31 08:13

    You can use apply with nsmallest:

    n = 3
    df.apply(lambda x: pd.Series(x.nsmallest(n).index))
    
    #   X   Y   Z
    #0  A   J   J
    #1  B   I   I
    #2  C   H   H
    
    0 讨论(0)
  • 2020-12-31 08:20

    Faster numpy solution with numpy.argsort:

    N = 3
    a = np.argsort(-df.values, axis=0)[-1:-1-N:-1]
    print (a)
    [[0 9 9]
     [1 8 8]
     [2 7 7]]
    
    b = pd.DataFrame(df.index[a], columns=df.columns)
    print (b)
       X  Y  Z
    0  A  J  J
    1  B  I  I
    2  C  H  H
    

    Timings:

    In [111]: %timeit (pd.DataFrame(df.index[np.argsort(-df.values, axis=0)[-1:-1-N:-1]], columns=df.columns))
    159 µs ± 1.37 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    
    In [112]: %timeit (df.apply(lambda x: pd.Series(x.nsmallest(N).index)))
    3.52 ms ± 49.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    
    0 讨论(0)
  • 2020-12-31 08:24

    First, you want to sort your input dataframe per column, then get a list of all of the indices of each column, create a dataframe from these indices, then return the top n rows from the resultant dataframe.

    def topN(df, n):
    #first, sort dataframe per column
    sort_x = df.sort_values(by = ['X'], ascending = True)
    sort_y = df.sort_values(by = ['Y'], ascending = True)
    sort_z = df.sort_values(by = ['Z'], ascending = True)
    #now get a list of the indices of each sorted df
    index_list_x = sort_x.index.values.tolist()
    index_list_y = sort_y.index.values.tolist()
    index_list_z = sort_z.index.values.tolist()
    #create dataframe from lists
    sorted_df = pd.DataFrame(
        {'sorted_x':index_list_x,
         'sorted_y':index_list_y,
         'sorted_z':index_list_z  
        })
    #return the top n from the sorted dataframe
    return sorted_df.iloc[0:n]
    
    topN(df,3)
    

    Returns:

      X  Y  Z
    0 A  J  J
    1 B  I  I
    2 C  H  H
    
    0 讨论(0)
提交回复
热议问题