why sort_values() is diifferent form sort_values().values

北慕城南 提交于 2021-02-20 02:58:41

问题


I want to sort a dataframe by all columns,and I find a way to solve that using

df = df.apply( lambda x: x.sort_values())   

and I used it to my data

text1 = text
text = text.apply( lambda x : x.sort_values())
text1 = text1.apply( lambda x : x.sort_values().values)
text.head()
text1.head()

why not text = text.apply( lambda x : x.sort_values()) get a wrong answer,and what is the .vaules)function?

text.head()
    Wave    2881.394531 2880.574219 2879.75293  2878.931641 2878.111328
    N-1     0.220934    0.203666    0.205743    0.196011    0.176293
    N-10    0.432692    0.387074    0.395692    0.355331    0.358963
    N-11    0.483360    0.463233    0.456304    0.428930    0.421482
    N-12    0.365057    0.364417    0.385134    0.352451    0.350513
    N-13    0.492172    0.466263    0.480657    0.439115    0.404883


text1.head()
    Wave    2881.394531 2880.574219 2879.75293  2878.931641 2878.111328
    P+1    -21.297623   -25.141329  -21.097095  -31.380476  -38.847958
    P+2    -12.681051   -14.661134  -13.688742  -16.829298  -20.320133
    P+3    -8.164744    -13.097990  -11.784309  -15.419610  -17.822252
    P+4    -0.023353    -0.926852   -8.036203   -14.583183  -17.071484
    P+5     0.022854    -0.037756   -0.002519   -1.891178   -7.795961

回答1:


By default, Pandas operations align data based on their index. So consider for example

In [19]: df = pd.DataFrame([(10,1),(9,2),(8,3),(7,4)], index=list('ABDC'))

In [20]: df
Out[20]: 
    0  1
A  10  1
B   9  2
D   8  3
C   7  4

When Pandas evaluates df.apply(lambda x: x.sort_values()), it generates the Series:

In [24]: df[0].sort_values()
Out[24]: 
C     7
D     8
B     9
A    10
Name: 0, dtype: int64

In [25]: df[1].sort_values()
Out[25]: 
A    1
B    2
D    3
C    4
Name: 1, dtype: int64

and then tries to combine these two Series into a resultant DataFrame. It does that by aligning the indices:

In [21]: df.apply(lambda x: x.sort_values())   
Out[21]: 
    0  1
A  10  1
B   9  2
C   7  4
D   8  3

In contrast, when the lambda function returns a NumPy array there is no index to align upon. So Pandas merely pastes the values from the NumPy array into a resultant DataFrame in the same order.

So, when Pandas evaluates df.apply(lambda x: x.sort_values().values), it generates the NumPy arrays:

In [26]: df[0].sort_values().values
Out[26]: array([ 7,  8,  9, 10])

In [27]: df[1].sort_values().values
Out[27]: array([1, 2, 3, 4])

and then tries to combine these two NumPy arrays into a resultant DataFrame with the values in the same order

In [28]: df.apply(lambda x: x.sort_values().values)   
Out[28]: 
    0  1
A   7  1
B   8  2
D   9  3
C  10  4



回答2:


Welcome to StackOverflow!

Based on pandas documentation, sort_values() return the DataFrame object itself, while values() return the numpy array representation of the values in the DataFrame. Since apply() applies the specified function across the axis of the DataFrame, the applied function must return the numpy array representation of that current row/column, instead of returning the whole DataFrame. That is why it gives you the wrong result when you are only using sort_values().

You can read the more complete explanation at sort_values() documentation, values() documentation, and apply() documentation



来源:https://stackoverflow.com/questions/53292709/why-sort-values-is-diifferent-form-sort-values-values

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!