Python pandas dataframe sort_values does not work

匿名 (未验证) 提交于 2019-12-03 08:33:39

问题:

I have the following pandas data frame which I want to sort by 'test_type'

  test_type         tps          mtt        mem        cpu       90th 0  sso_1000  205.263559  4139.031090  24.175933  34.817701  4897.4766 1  sso_1500  201.127133  5740.741266  24.599400  34.634209  6864.9820 2  sso_2000  203.204082  6610.437558  24.466267  34.831947  8005.9054 3   sso_500  189.566836  2431.867002  23.559557  35.787484  2869.7670 

My code to load the dataframe and sort it is, the first print line prints the data frame above.

        df = pd.read_csv(file) #reads from a csv file         print df         df = df.sort_values(by=['test_type'], ascending=True)         print '\nAfter sort...'         print df 

After doing the sort and printing the dataframe content, the data frame still looks like below.

Program output:

After sort...   test_type         tps          mtt        mem        cpu       90th 0  sso_1000  205.263559  4139.031090  24.175933  34.817701  4897.4766 1  sso_1500  201.127133  5740.741266  24.599400  34.634209  6864.9820 2  sso_2000  203.204082  6610.437558  24.466267  34.831947  8005.9054 3   sso_500  189.566836  2431.867002  23.559557  35.787484  2869.7670 

I expect row 3 (test type: sso_500 row) to be on top after sorting. Can someone help me figure why it's not working as it should?

回答1:

Presumbaly, what you're trying to do is sort by the numerical value after sso_. You can do this as follows:

import numpy as np  df.ix[np.argsort(df.test_type.str.split('_').str[-1].astype(int).values) 

This

  1. splits the strings at _

  2. converts what's after this character to the numerical value

  3. Finds the indices sorted according to the numerical values

  4. Reorders the DataFrame according to these indices

Example

In [15]: df = pd.DataFrame({'test_type': ['sso_1000', 'sso_500']})  In [16]: df.sort_values(by=['test_type'], ascending=True) Out[16]:    test_type 0  sso_1000 1   sso_500  In [17]: df.ix[np.argsort(df.test_type.str.split('_').str[-1].astype(int).values)] Out[17]:    test_type 1   sso_500 0  sso_1000 


回答2:

Alternatively, you could also extract the numbers from test_type and sort them. Followed by reindexing DF according to those indices.

df.reindex(df['test_type'].str.extract('(\d+)', expand=False)    \                           .astype(int).sort_values().index).reset_index(drop=True) 



易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!