I am trying to use a Boolean mask to get a match from 2 different dataframes. U
Using the logical OR operator:
x = df[(df[\'A\'].isin(df2[\'B\']))
As far as I have come to understand this issue (coming from a C++ background and currently learning Python for data sciences) I stumbled upon several posts suggesting that bitwise operators (&, |) can be overloaded in classes, just like C++ does.
So basically, while you may use such bitwise operators on numbers they will compare the bits and give you the result. So for instance, if you have the following:
1 | 2 # will result in 3
What Python will actually do is compare the bits of these numbers:
00000001 | 00000010
The result will be:
00000011 (because 0 | 0 is False, ergo 0; and 0 | 1 is True, ergo 1)
As an integer: 3
It compares each bit of the numbers and spit out the result of these eight consecutive operations. This is the normal behaviour of these operators.
Enter Pandas. As you can overload these operators, Pandas has made use of this. So what bitwise operators do when coming to pandas dataframes, is the following:
(dataframe1['column'] == "expression") & (dataframe1['column'] != "another expression)
In this case, first pandas will create a series of trues or falses depending on the result of the == and != operations (be careful: you have to put braces around the outer expressions because python will always try to resolve first bitwise operators and THEN the other comparision operators!!). So it will compare each value in the column to the expression and either output a true or a false.
Then you'd have two same-length series of trues and falses. What it THEN does is take these two serieses and basically compare them with either "and" (&) or "or" (|), and finally spit out one single series either fulfilling or not fulfilling all three comparision operations.
To go even further, what I think is happening under the hood is that the &-operator actually calls a function of pandas, gives them both previously evaluated operations (so the two serieses to the left and right of the operator) and pandas then compares two distinct values at a time, returning a True or False depending on the internal mechanism to determine this.
This is basically the same principle they've used for all other operators as well (>, <, >=, <=, ==, !=).
Why do the struggle and use a different &-expression when you got the nice and neat "and"? Well, that seems to be because "and" is just hard coded and cannot be altered manually.
Hope that helps!