Changing row order in pandas dataframe without losing or messing up data

后端 未结 2 1645
日久生厌
日久生厌 2020-12-21 08:17

I have following dataframe:

(Index)    sample    reads yeasts    
9          CO ref    10
10         CO raai   20
11         CO tus    30

I

相关标签:
2条回答
  • 2020-12-21 08:58

    For reindex is necessary create index from sample column:

    df=df.set_index(['sample']).reindex(["CO ref","CO tus","CO raai"]).reset_index()
    

    Or use ordered categorical:

    cats = ["CO ref","CO tus","CO raai"]
    df['sample'] = pd.CategoricalIndex(df['sample'], ordered=True, categories=cats)
    df = df.sort_values('sample')
    
    0 讨论(0)
  • 2020-12-21 09:16

    The solution from jezrael is of course correct, and most likely the fastest. But since this is really just a question of restructuring your dataframe I'd like to show you how you can easily do that and at the same time let your procedure select which subset of your sorting column to use.

    The following very simple function will let you specify both the subset and order of your dataframe:

    # function to subset and order a pandas
    # dataframe of a long format
    def order_df(df_input, order_by, order):
        df_output=pd.DataFrame()
        for var in order:    
            df_append=df_input[df_input[order_by]==var].copy()
            df_output = pd.concat([df_output, df_append])
        return(df_output)
    

    Here's an example using the iris dataset from plotly express. df['species'].unique() will show you the order of that column:

    Output:

    array(['setosa', 'versicolor', 'virginica'], dtype=object)
    

    Now, running the following complete snippet with the function above will give you a new specified order. No need for categorical variables or tampering of the index.

    Complete code with datasample:

    # imports
    import pandas as pd
    import plotly.express as px
    
    # data
    df = px.data.iris()
    
    # function to subset and order a pandas
    # dataframe fo a long format
    def order_df(df_input, order_by, order):
        df_output=pd.DataFrame()
        for var in order:    
            df_append=df_input[df_input[order_by]==var].copy()
            df_output = pd.concat([df_output, df_append])
        return(df_output)
    
    # data subsets
    df_new = order_df(df_input = df, order_by='species', order=['virginica', 'setosa', 'versicolor'])
    df_new['species'].unique()
    

    Output:

    array(['virginica', 'setosa', 'versicolor'], dtype=object)
    
    0 讨论(0)
提交回复
热议问题