问题
I have dataframe users
with different columns. My goal is to add the column [uses_name
] which should be True
when a password is the same as each users first or last name.
For example, [user_name
] in twelve row contain milford.hubbard
. Then in [uses_name
] will be True
, because the [password
] and [last_name
] are the same.
To do this, I create two columns [first_name
] and [last_name
] with regular expressions. When create [uses_name
] I have trouble with |
operator. I am read more in pandas doc about Boolean indexing but not find an answer.
My code:
import pandas as pd
users = pd.read_csv('datasets/users.csv')
# Extracting first and last names into their own columns
users['first_name'] = users['user_name'].str.extract(r'(^\w+)', expand=False)
users['last_name'] = users['user_name'].str.extract(r'(\w+$)', expand=False)
# Flagging the users with passwords that matches their names
users['uses_name'] = users['password'].isin(users['first_name'] | users['last_name'])
# Counting and printing the number of users using names as passwords
print(users['uses_name'].count())
# Taking a look at the 12 first rows
print(users.head(12))
When I try to compile this, I give an error:
TypeError: unsupported operand type(s) for |: 'str' and 'bool'
First 12 rows in users
dataframe with created first_name
and last_name
columns:
id user_name password first_name last_name
0 1 vance.jennings joobheco vance jennings
1 2 consuelo.eaton 0869347314 consuelo eaton
2 3 mitchel.perkins fabypotter mitchel perkins
3 4 odessa.vaughan aharney88 odessa vaughan
2 3 mitchel.perkins fabypotter mitchel perkins
3 4 odessa.vaughan aharney88 odessa vaughan
4 5 araceli.wilder acecdn3000 araceli wilder
5 6 shawn.harrington 5278049 shawn harrington
6 7 evelyn.gay master evelyn gay
7 8 noreen.hale murphy noreen hale
8 9 gladys.ward lwsves2 gladys ward
9 10 brant.zimmerman 1190KAREN5572497 brant zimmerman
10 11 leanna.abbott aivlys24 leanna abbott
11 12 milford.hubbard hubbard milford hubbard
回答1:
This works:
users['uses_name']= (users['password']==users['first_name'] )| (users['password']==users['last_name'])
回答2:
You can concat , since both of then are Series
users['password'].isin(pd.concat([users['first_name'],users['last_name']]))
Since you change the question , Update one
df[['first_name','last_name']].eq(df.password,axis=0).any(1)
回答3:
Use numpy.union1d:
val = np.union1d(users['first_name'], users['last_name'])
users['uses_name'] = users['password'].isin(val)
print (users)
id user_name password first_name last_name uses_name
0 1 vance.jennings joobheco vance jennings False
1 2 consuelo.eaton 0869347314 consuelo eaton False
2 3 mitchel.perkins fabypotter mitchel perkins False
3 4 odessa.vaughan aharney88 odessa vaughan False
2 3 mitchel.perkins fabypotter mitchel perkins False
3 4 odessa.vaughan aharney88 odessa vaughan False
4 5 araceli.wilder acecdn3000 araceli wilder False
5 6 shawn.harrington 5278049 shawn harrington False
6 7 evelyn.gay master evelyn gay False
7 8 noreen.hale murphy noreen hale False
8 9 gladys.ward lwsves2 gladys ward False
9 10 brant.zimmerman 1190KAREN5572497 brant zimmerman False
10 11 leanna.abbott aivlys24 leanna abbott False
11 12 milford.hubbard hubbard milford hubbard True
回答4:
I think the best would be to perform a set
union and pass that to isin
:
users['uses_name'] = users['password'].isin(
set(users['first_name']).union(users['last_name'])
)
users
id user_name password first_name last_name uses_name
0 1 vance.jennings joobheco vance jennings False
1 2 consuelo.eaton 0869347314 consuelo eaton False
2 3 mitchel.perkins fabypotter mitchel perkins False
3 4 odessa.vaughan aharney88 odessa vaughan False
2 3 mitchel.perkins fabypotter mitchel perkins False
3 4 odessa.vaughan aharney88 odessa vaughan False
4 5 araceli.wilder acecdn3000 araceli wilder False
5 6 shawn.harrington 5278049 shawn harrington False
6 7 evelyn.gay master evelyn gay False
7 8 noreen.hale murphy noreen hale False
8 9 gladys.ward lwsves2 gladys ward False
9 10 brant.zimmerman 1190KAREN5572497 brant zimmerman False
10 11 leanna.abbott aivlys24 leanna abbott False
11 12 milford.hubbard hubbard milford hubbard True
Note that |
is the logical OR, it has no meaning for string columns in pandas.
来源:https://stackoverflow.com/questions/49364654/typeerror-unsupported-operand-types-for-str-and-bool