Pandas sort_values does not sort numbers correctly

后端 未结 1 1787
耶瑟儿~
耶瑟儿~ 2021-01-02 07:41

I\'m new to pandas and working with tabular data in a programming environment. I have sorted a dataframe by a specific column but the answer that panda spits out is not exac

相关标签:
1条回答
  • 2021-01-02 08:24

    For whatever reason, you seem to be working with a column of strings, and sort_values is returning you a lexsorted result.

    Here's an example.

    df = pd.DataFrame({"Col": ['1', '2', '3', '10', '20', '19']})
    df
    
      Col
    0   1
    1   2
    2   3
    3  10
    4  20
    5  19
    
    df.sort_values('Col')
    
      Col
    0   1
    3  10
    5  19
    1   2
    4  20
    2   3
    

    The remedy is to convert it to numeric, either using .astype or pd.to_numeric.

    df.Col = df.Col.astype(float)
    

    Or,

    df.Col = pd.to_numeric(df.Col, errors='coerce')
    
    df.sort_values('Col')
    
       Col
    0    1
    1    2
    2    3
    3   10
    5   19
    4   20
    

    The only difference b/w astype and pd.to_numeric is that the latter is more robust at handling non-numeric strings (they're coerced to NaN), and will attempt to preserve integers if a coercion to float is not necessary (as is seen in this case).

    0 讨论(0)
提交回复
热议问题