Changing row order in pandas dataframe without losing or messing up data

后端未结

关注

 2  1645

I have following dataframe:

(Index)    sample    reads yeasts    
9          CO ref    10
10         CO raai   20
11         CO tus    30

相关标签:

2条回答

我寻月下人不归

2020-12-21 08:58

For reindex is necessary create index from sample column:

df=df.set_index(['sample']).reindex(["CO ref","CO tus","CO raai"]).reset_index()

Or use ordered categorical:

cats = ["CO ref","CO tus","CO raai"]
df['sample'] = pd.CategoricalIndex(df['sample'], ordered=True, categories=cats)
df = df.sort_values('sample')

0 讨论(0)

孤城傲影

2020-12-21 09:16

The solution from jezrael is of course correct, and most likely the fastest. But since this is really just a question of restructuring your dataframe I'd like to show you how you can easily do that and at the same time let your procedure select which subset of your sorting column to use.

The following very simple function will let you specify both the subset and order of your dataframe:

# function to subset and order a pandas
# dataframe of a long format
def order_df(df_input, order_by, order):
    df_output=pd.DataFrame()
    for var in order:    
        df_append=df_input[df_input[order_by]==var].copy()
        df_output = pd.concat([df_output, df_append])
    return(df_output)

Here's an example using the iris dataset from plotly express. df['species'].unique() will show you the order of that column:

Output:

array(['setosa', 'versicolor', 'virginica'], dtype=object)

Now, running the following complete snippet with the function above will give you a new specified order. No need for categorical variables or tampering of the index.

Complete code with datasample:

# imports
import pandas as pd
import plotly.express as px

# data
df = px.data.iris()

# function to subset and order a pandas
# dataframe fo a long format
def order_df(df_input, order_by, order):
    df_output=pd.DataFrame()
    for var in order:    
        df_append=df_input[df_input[order_by]==var].copy()
        df_output = pd.concat([df_output, df_append])
    return(df_output)

# data subsets
df_new = order_df(df_input = df, order_by='species', order=['virginica', 'setosa', 'versicolor'])
df_new['species'].unique()

Output:

array(['virginica', 'setosa', 'versicolor'], dtype=object)

0 讨论(0)