I was read Andy\'s answer to the question Outputting difference in two Pandas dataframes side by side - highlighting the difference
i have two questions regarding th
Question 1
ne_stacked
is a pd.Series
that consists of True
and False
values that indicate where df1
and df2
are not equal.
ne_stacked[boolean_array]
is a way to filter the series ne_stacked
by eliminating the rows of ne_stacked
where boolean_array
is False
and keeping the rows of ne_stacked
where boolean_array
is True
.
It so happens that ne_stacked
is also a boolean array and so can be used to filter itself. Why would be want to do this? So we can see what the values of the index are after we've filtered.
So ne_stacked[ne_stacked]
is a subset of ne_stacked
with only True
values.
Question 2
np.where
np.where
does two things, if you only pass a conditional like in np.where(df1 != df2)
, you get a tuple
of arrays where the first is a reference of all row indices to be used in conjunction with the second element of the tuple
that is a reference to all column indices. I usually use it like this
i, j = np.where(df1 != df2)
Now I can get at all elements of df1
or df2
in which there are differences like
df.values[i, j]
Or I can assign to those cells
df.values[i, j] = -99
Or lots of other useful things.
You can also use np.where
as an if, then, else for arrays
np.where(df1 != df2, -99, 99)
To produce an array the same size as df1
or df2
where you have -99
in all the places where df1 != df2
and 99
in the rest.
df.where
On the other hand df.where
evaluates the first argument of boolean values and returns an object of equal size to df
where the cells that evaluated to True
are kept and the rest are either np.nan
or the values passed in the second argument of df.where
df1.where(df1 != df2)
Or
df1.where(df1 != df2, -99)
are they the same?
Clearly they are not the "same". But you can use them similarly
np.where(df1 != df2, df1, -99)
Should be the same as
df1.where(df1 != df2, -99).values