Sort DataFrame index that has a string and number

前端 未结 2 2015
暗喜
暗喜 2021-02-15 16:16

My df DataFrame index looks like this:

Com_Lag_01
Com_Lag_02
Com_Lag_03
Com_Lag_04
Com_Lag_05
Com_Lag_06
Com_Lag_07
Com_Lag_08
Com_Lag_09
Com_Lag_10         


        
相关标签:
2条回答
  • 2021-02-15 16:57

    One could try something like this, by performing a sort on a numbered version of the index

    import pandas as pd
    # Create a DataFrame example
    df = pd.DataFrame(\
        {'Year': [1991 ,2004 ,2001 ,2009 ,1997],\
        'Age': [27 ,25 ,22 ,34 ,31],\
        },\
        index = ['Com_Lag_1' ,'Com_Lag_12' ,'Com_Lag_3' ,'Com_Lag_24' ,'Com_Lag_5'])
    
    # Add of a column containing a numbered version of the index
    df['indexNumber'] = [int(i.split('_')[-1]) for i in df.index]
    # Perform sort of the rows
    df.sort(['indexNumber'], ascending = [True], inplace = True)
    # Deletion of the added column
    df.drop('indexNumber', 1, inplace = True)
    


    Edit 2017 - V1:

    To avoid SettingWithCopyWarning:

    df = df.assign(indexNumber=[int(i.split('_')[-1]) for i in df.index])
    

    Edit 2017 - V2 for Pandas Version 0.21.0

    import pandas as pd
    print(pd.__version__)
    # Create a DataFrame example
    df = pd.DataFrame(\
        {'Year': [1991 ,2004 ,2001 ,2009 ,1997],\
        'Age': [27 ,25 ,22 ,34 ,31],\
        },\
        index = ['Com_Lag_1' ,'Com_Lag_12' ,'Com_Lag_3' ,'Com_Lag_24' ,'Com_Lag_5'])
    
    df.reindex(index=df.index.to_series().str.rsplit('_').str[-1].astype(int).sort_values().index)
    
    0 讨论(0)
  • 2021-02-15 17:04

    Solution without new column with DataFrame.reindex by index of sorted Series :

    a = df.index.to_series().str.rsplit('_').str[-1].astype(int).sort_values()
    print (a)
    Com_Lag_1      1
    Com_Lag_3      3
    Com_Lag_5      5
    Com_Lag_12    12
    Com_Lag_24    24
    dtype: int32
    
    df = df.reindex(index=a.index)
    print (df)
                Age  Year
    Com_Lag_1    27  1991
    Com_Lag_3    22  2001
    Com_Lag_5    31  1997
    Com_Lag_12   25  2004
    Com_Lag_24   34  2009
    

    But if duplicated values is necessary add new column:

    df = pd.DataFrame(\
        {'Year': [1991 ,2004 ,2001 ,2009 ,1997],\
        'Age': [27 ,25 ,22 ,34 ,31],\
        },\
        index = ['Com_Lag_1' ,'Com_Lag_12' ,'Com_Lag_3' ,'Com_Lag_24' ,'Com_Lag_12'])
    
    print (df)
                Age  Year
    Com_Lag_1    27  1991
    Com_Lag_12   25  2004
    Com_Lag_3    22  2001
    Com_Lag_24   34  2009
    Com_Lag_12   31  1997
    
    df['indexNumber'] = df.index.str.rsplit('_').str[-1].astype(int)
    df = df.sort_values(['indexNumber']).drop('indexNumber', axis=1)
    print (df)
                Age  Year
    Com_Lag_1    27  1991
    Com_Lag_3    22  2001
    Com_Lag_12   25  2004
    Com_Lag_12   31  1997
    Com_Lag_24   34  2009
    
    0 讨论(0)
提交回复
热议问题