I\'m new to pandas and working with tabular data in a programming environment. I have sorted a dataframe by a specific column but the answer that panda spits out is not exac
For whatever reason, you seem to be working with a column of strings, and sort_values
is returning you a lexsorted result.
Here's an example.
df = pd.DataFrame({"Col": ['1', '2', '3', '10', '20', '19']})
df
Col
0 1
1 2
2 3
3 10
4 20
5 19
df.sort_values('Col')
Col
0 1
3 10
5 19
1 2
4 20
2 3
The remedy is to convert it to numeric, either using .astype
or pd.to_numeric
.
df.Col = df.Col.astype(float)
Or,
df.Col = pd.to_numeric(df.Col, errors='coerce')
df.sort_values('Col')
Col
0 1
1 2
2 3
3 10
5 19
4 20
The only difference b/w astype
and pd.to_numeric
is that the latter is more robust at handling non-numeric strings (they're coerced to NaN
), and will attempt to preserve integers if a coercion to float is not necessary (as is seen in this case).