问题
I want to sort a dataframe by all columns,and I find a way to solve that using
df = df.apply( lambda x: x.sort_values())
and I used it to my data
text1 = text
text = text.apply( lambda x : x.sort_values())
text1 = text1.apply( lambda x : x.sort_values().values)
text.head()
text1.head()
why not text = text.apply( lambda x : x.sort_values())
get a wrong answer,and what is the .vaules)
function?
text.head()
Wave 2881.394531 2880.574219 2879.75293 2878.931641 2878.111328
N-1 0.220934 0.203666 0.205743 0.196011 0.176293
N-10 0.432692 0.387074 0.395692 0.355331 0.358963
N-11 0.483360 0.463233 0.456304 0.428930 0.421482
N-12 0.365057 0.364417 0.385134 0.352451 0.350513
N-13 0.492172 0.466263 0.480657 0.439115 0.404883
text1.head()
Wave 2881.394531 2880.574219 2879.75293 2878.931641 2878.111328
P+1 -21.297623 -25.141329 -21.097095 -31.380476 -38.847958
P+2 -12.681051 -14.661134 -13.688742 -16.829298 -20.320133
P+3 -8.164744 -13.097990 -11.784309 -15.419610 -17.822252
P+4 -0.023353 -0.926852 -8.036203 -14.583183 -17.071484
P+5 0.022854 -0.037756 -0.002519 -1.891178 -7.795961
回答1:
By default, Pandas operations align data based on their index. So consider for example
In [19]: df = pd.DataFrame([(10,1),(9,2),(8,3),(7,4)], index=list('ABDC'))
In [20]: df
Out[20]:
0 1
A 10 1
B 9 2
D 8 3
C 7 4
When Pandas evaluates df.apply(lambda x: x.sort_values())
,
it generates the Series:
In [24]: df[0].sort_values()
Out[24]:
C 7
D 8
B 9
A 10
Name: 0, dtype: int64
In [25]: df[1].sort_values()
Out[25]:
A 1
B 2
D 3
C 4
Name: 1, dtype: int64
and then tries to combine these two Series into a resultant DataFrame. It does that by aligning the indices:
In [21]: df.apply(lambda x: x.sort_values())
Out[21]:
0 1
A 10 1
B 9 2
C 7 4
D 8 3
In contrast, when the lambda function returns a NumPy array there is no index to align upon. So Pandas merely pastes the values from the NumPy array into a resultant DataFrame in the same order.
So, when Pandas evaluates df.apply(lambda x: x.sort_values().values)
,
it generates the NumPy arrays:
In [26]: df[0].sort_values().values
Out[26]: array([ 7, 8, 9, 10])
In [27]: df[1].sort_values().values
Out[27]: array([1, 2, 3, 4])
and then tries to combine these two NumPy arrays into a resultant DataFrame with the values in the same order
In [28]: df.apply(lambda x: x.sort_values().values)
Out[28]:
0 1
A 7 1
B 8 2
D 9 3
C 10 4
回答2:
Welcome to StackOverflow!
Based on pandas documentation, sort_values()
return the DataFrame object itself, while values()
return the numpy array representation of the values in the DataFrame. Since apply()
applies the specified function across the axis of the DataFrame, the applied function must return the numpy array representation of that current row/column, instead of returning the whole DataFrame. That is why it gives you the wrong result when you are only using sort_values()
.
You can read the more complete explanation at sort_values() documentation, values() documentation, and apply() documentation
来源:https://stackoverflow.com/questions/53292709/why-sort-values-is-diifferent-form-sort-values-values