python pandas 3 smallest & 3 largest values

前端 未结 5 1004
误落风尘
误落风尘 2021-01-20 03:03

How can I find the index of the 3 smallest and 3 largest values in a column in my pandas dataframe? I saw ways to find max and min, but none to get the 3.

相关标签:
5条回答
  • 2021-01-20 03:20

    You want to take a look at argsort (in numpy and in pandas)

    df = pd.DataFrame(np.random.randint(1,100,100).reshape(10,10))
    # bottom three indexes
    df[0].argsort().values[:3]    
    # top three indexes
    df[0].argsort().values[-3:]
    
    0 讨论(0)
  • 2021-01-20 03:28
    In [55]: import numpy as np               
    
    In [56]: import pandas as pd              
    
    In [57]: s = pd.Series(np.random.randn(5))
    
    In [58]: s
    Out[58]: 
    0    0.152037
    1    0.194204
    2    0.296090
    3    1.071013
    4   -0.324589
    dtype: float64
    
    In [59]: s.nsmallest(3) ## s.drop_duplicates().nsmallest(3); if duplicates exists               
    Out[59]: 
    4   -0.324589
    0    0.152037
    1    0.194204
    dtype: float64
    
    In [60]: s.nlargest(3) ## s.drop_duplicates().nlargest(3); if duplicates exists             
    Out[60]: 
    3    1.071013
    2    0.296090
    1    0.194204
    dtype: float64
    
    0 讨论(0)
  • 2021-01-20 03:29
    import pandas as pd
    import numpy as np
    np.random.seed(1)
    x=np.random.randint(1,100,10)
    y=np.random.randint(1000,10000,10)
    
    x
    array([38, 13, 73, 10, 76,  6, 80, 65, 17,  2])
    y
    array([8751, 4462, 6396, 6374, 3962, 3516, 9444, 4562, 5764, 9093])
    
    data=pd.DataFrame({"age":x,
                   "salary":y})
    
    
    data.nlargest(5,"age").nsmallest(5,"salary")
    
    0 讨论(0)
  • 2021-01-20 03:35

    With smaller Series, you're better off just sorting then taking head/tail!

    This is a pandas feature request, should see in 0.14 (need to overcome some fiddly bits with different dtypes), an efficient solution for larger Series (> 1000 elements) is using kth_smallest from pandas algos (warning this function mutates the array it's applied to so use a copy!):

    In [11]: s = pd.Series(np.random.randn(10))
    
    In [12]: s
    Out[12]: 
    0    0.785650
    1    0.969103
    2   -0.618300
    3   -0.770337
    4    1.532137
    5    1.367863
    6   -0.852839
    7    0.967317
    8   -0.603416
    9   -0.889278
    dtype: float64
    
    In [13]: n = 3
    
    In [14]: pd.algos.kth_smallest(s.values.astype(float), n - 1)
    Out[14]: -0.7703374582084163
    
    In [15]: s[s <= pd.algos.kth_smallest(s.values.astype(float), n - 1)]
    Out[15]: 
    3   -0.770337
    6   -0.852839
    9   -0.889278
    dtype: float64
    

    If you want this in order:

    In [16]: s[s <= pd.algos.kth_smallest(s.values.astype(float), n - 1)].order()
    Out[16]: 
    9   -0.889278
    6   -0.852839
    3   -0.770337
    dtype: float64
    

    If you're worried about duplicates (join nth place) you can take the head:

    In [17]: s[s <= pd.algos.kth_smallest(s.values.astype(float), n - 1)].order().head(n)
    Out[17]: 
    9   -0.889278
    6   -0.852839
    3   -0.770337
    dtype: float64
    
    0 讨论(0)
  • 2021-01-20 03:39

    What have you tried? You could sort with s.sort() and then call s.head(3).index and s.tail(3).index.

    0 讨论(0)
提交回复
热议问题