comparing two DataFrames, specific questions

后端未结

关注

 1  1356

I was read Andy\'s answer to the question Outputting difference in two Pandas dataframes side by side - highlighting the difference

i have two questions regarding th

相关标签:

1条回答

爱一瞬间的悲伤

2020-12-06 15:55
Question 1

ne_stacked is a pd.Series that consists of True and False values that indicate where df1 and df2 are not equal.

ne_stacked[boolean_array] is a way to filter the series ne_stacked by eliminating the rows of ne_stacked where boolean_array is False and keeping the rows of ne_stacked where boolean_array is True.

It so happens that ne_stacked is also a boolean array and so can be used to filter itself. Why would be want to do this? So we can see what the values of the index are after we've filtered.

So ne_stacked[ne_stacked] is a subset of ne_stacked with only True values.

Question 2

np.where

np.where does two things, if you only pass a conditional like in np.where(df1 != df2), you get a tuple of arrays where the first is a reference of all row indices to be used in conjunction with the second element of the tuple that is a reference to all column indices. I usually use it like this
```
i, j = np.where(df1 != df2)
```
Now I can get at all elements of df1 or df2 in which there are differences like
```
df.values[i, j]
```
Or I can assign to those cells
```
df.values[i, j] = -99
```
Or lots of other useful things.

You can also use np.where as an if, then, else for arrays
```
np.where(df1 != df2, -99, 99)
```
To produce an array the same size as df1 or df2 where you have -99 in all the places where df1 != df2 and 99 in the rest.

df.where

On the other hand df.where evaluates the first argument of boolean values and returns an object of equal size to df where the cells that evaluated to True are kept and the rest are either np.nan or the values passed in the second argument of df.where
```
df1.where(df1 != df2)
```
Or
```
df1.where(df1 != df2, -99)
```
are they the same?
Clearly they are not the "same". But you can use them similarly
```
np.where(df1 != df2, df1, -99)
```
Should be the same as
```
df1.where(df1 != df2, -99).values
```
0 讨论(0)
发布评论:

提交评论
- 加载中...