问题
I would like to apply a natural sort order to a column in a pandas DataFrame
. The columns that I would like to sort might contain duplicates. I have seen the related Naturally sorting Pandas DataFrame
question, however it was concerning sorting the index, not any column.
Example
df = pd.DataFrame({'a': ['a22', 'a20', 'a1', 'a10', 'a3', 'a1', 'a11'], 'b': ['b5', 'b2', 'b11', 'b22', 'b4', 'b1', 'b12']})
a b
0 a22 b5
1 a20 b2
2 a1 b11
3 a10 b22
4 a3 b4
5 a1 b1
6 a11 b12
Natural sort column a
:
a b
0 a1 b11
1 a1 b1
2 a3 b4
3 a10 b22
4 a11 b12
5 a20 b2
6 a22 b5
Natural sort column b
:
a b
0 a1 b1
1 a20 b2
2 a3 b4
3 a22 b5
4 a1 b11
5 a11 b12
6 a10 b22
回答1:
You can convert values to ordered categorical with sorted catgories by natsorted
and then use sort_values
:
import natsort as ns
df['a'] = pd.Categorical(df['a'], ordered=True, categories= ns.natsorted(df['a'].unique()))
df = df.sort_values('a')
print (df)
a b
5 a1 b1
2 a1 b11
4 a3 b4
3 a10 b22
6 a11 b12
1 a20 b2
0 a22 b5
df['b'] = pd.Categorical(df['b'], ordered=True, categories= ns.natsorted(df['b'].unique()))
df = df.sort_values('b')
print (df)
a b
5 a1 b1
1 a20 b2
4 a3 b4
0 a22 b5
2 a1 b11
6 a11 b12
3 a10 b22
回答2:
df.sort_values(by=['a'])
and
df.sort_values(by=['b'])
回答3:
We can use a regex to extract the text and integer parts of your columns, and then sort using them. Wrapping this in a function lets you do it for each column separately with ease:
def natural_sort(df, col):
df[['_str', '_int']] = df[col].str.extract(r'([a-zA-Z]*)(\d*)')
df['_int'] = df['_int'].astype(int)
return df.sort_values(by=['_str', '_int']).drop(['_int', '_str'], axis=1)
df = pd.DataFrame({'a': ['a22', 'a20', 'a1', 'a10', 'a3', 'a1', 'a11'], 'b': ['b5', 'b2', 'b11', 'b22', 'b4', 'b1', 'b12']})
print(natural_sort(df, 'a'))
print(natural_sort(df, 'b'))
prints:
a b
2 a1 b11
5 a1 b1
4 a3 b4
3 a10 b22
6 a11 b12
1 a20 b2
0 a22 b5
a b
5 a1 b1
1 a20 b2
4 a3 b4
0 a22 b5
2 a1 b11
6 a11 b12
3 a10 b22
来源:https://stackoverflow.com/questions/52366558/natural-sort-a-data-frame-column-in-pandas