问题
I have the following data frame:
import pandas as pd
df = pd.DataFrame({'Manufacturer':['Allen Edmonds', 'Louis Vuitton 23', 'Louis Vuitton 8', 'Gulfstream', 'Bombardier', '23 - Louis Vuitton', 'Louis Vuitton 20'],
'System':['None', 'None', '14 Platinum', 'Gold', 'None', 'Platinum 905', 'None']
})
I would like to create another column in the data frame named Pricing
, which contains the value "East Coast" if the following conditions hold:
a) if a substring in the Manufacturer
column matches "Louis",
AND
b) if a substring in the System
column matches "Platinum"
The following code operates on a single column:
df['Pricing'] = np.where(df['Manufacturer'].str.contains('Louis'), 'East Coast', 'None')
I tried to chain this together using AND:
df['Pricing'] = np.where(df['Manufacturer'].str.contains('Louis'), 'East Coast', 'None') and np.where(df['Manufacturer'].str.contains('Platimum'), 'East Coast', 'None')
But, I get the following error:
ValueError: The truth value of an array with more than one element is ambiguous. Use `a.any()` or `a.all()`
Can anyone help with how I would implement a.any()
or a.all()
given the two conditions "a" and "b" above? Or, perhaps there is a more efficient way to create this column without using np.where
?
Thanks in advance!
回答1:
Using .loc
to slice the dataframe, according to your conditions:
df.loc[(df['Manufacturer'].str.contains('Louis')) &
(df['System'].str.contains('Platinum')),
'Pricing'] = 'East Coast'
df
Manufacturer System Pricing
0 Allen Edmonds None NaN
1 Louis Vuitton 23 None NaN
2 Louis Vuitton 8 14 Platinum East Coast
3 Gulfstream Gold NaN
4 Bombardier None NaN
5 23 - Louis Vuitton Platinum 905 East Coast
6 Louis Vuitton 20 None NaN
回答2:
def contain(x):
if 'Louis' in x.Manufacturer and 'Platinum' in x.System:
return "East Coast"
df['pricing'] = df.apply(lambda x:contain(x),axis = 1)
来源:https://stackoverflow.com/questions/64839924/how-to-create-a-column-in-a-pandas-dataframe-based-on-a-conditional-substring-se