I have following dataframe:
(Index) sample reads yeasts
9 CO ref 10
10 CO raai 20
11 CO tus 30
I
For reindex
is necessary create index from sample
column:
df=df.set_index(['sample']).reindex(["CO ref","CO tus","CO raai"]).reset_index()
Or use ordered categorical:
cats = ["CO ref","CO tus","CO raai"]
df['sample'] = pd.CategoricalIndex(df['sample'], ordered=True, categories=cats)
df = df.sort_values('sample')
The solution from jezrael is of course correct, and most likely the fastest. But since this is really just a question of restructuring your dataframe I'd like to show you how you can easily do that and at the same time let your procedure select which subset of your sorting column to use.
The following very simple function will let you specify both the subset and order of your dataframe:
# function to subset and order a pandas
# dataframe of a long format
def order_df(df_input, order_by, order):
df_output=pd.DataFrame()
for var in order:
df_append=df_input[df_input[order_by]==var].copy()
df_output = pd.concat([df_output, df_append])
return(df_output)
Here's an example using the iris dataset from plotly express. df['species'].unique()
will show you the order of that column:
Output:
array(['setosa', 'versicolor', 'virginica'], dtype=object)
Now, running the following complete snippet with the function above will give you a new specified order. No need for categorical variables or tampering of the index.
Complete code with datasample:
# imports
import pandas as pd
import plotly.express as px
# data
df = px.data.iris()
# function to subset and order a pandas
# dataframe fo a long format
def order_df(df_input, order_by, order):
df_output=pd.DataFrame()
for var in order:
df_append=df_input[df_input[order_by]==var].copy()
df_output = pd.concat([df_output, df_append])
return(df_output)
# data subsets
df_new = order_df(df_input = df, order_by='species', order=['virginica', 'setosa', 'versicolor'])
df_new['species'].unique()
Output:
array(['virginica', 'setosa', 'versicolor'], dtype=object)