How to create a column in a Pandas dataframe based on a conditional substring search of one or more OTHER columns

问题

I have the following data frame:

import pandas as pd

df = pd.DataFrame({'Manufacturer':['Allen Edmonds', 'Louis Vuitton 23', 'Louis Vuitton 8', 'Gulfstream', 'Bombardier', '23 - Louis Vuitton', 'Louis Vuitton 20'],
                   'System':['None', 'None', '14 Platinum', 'Gold', 'None', 'Platinum 905', 'None']
                  })

I would like to create another column in the data frame named Pricing, which contains the value "East Coast" if the following conditions hold:

a) if a substring in the Manufacturer column matches "Louis",

AND

b) if a substring in the System column matches "Platinum"

The following code operates on a single column:

df['Pricing'] = np.where(df['Manufacturer'].str.contains('Louis'), 'East Coast', 'None')

I tried to chain this together using AND:

df['Pricing'] = np.where(df['Manufacturer'].str.contains('Louis'), 'East Coast', 'None') and np.where(df['Manufacturer'].str.contains('Platimum'), 'East Coast', 'None')

But, I get the following error:

ValueError: The truth value of an array with more than one element is ambiguous. Use `a.any()` or `a.all()`

Can anyone help with how I would implement a.any() or a.all() given the two conditions "a" and "b" above? Or, perhaps there is a more efficient way to create this column without using np.where?

Thanks in advance!

回答1:

Using .loc to slice the dataframe, according to your conditions:

df.loc[(df['Manufacturer'].str.contains('Louis')) & 
       (df['System'].str.contains('Platinum')),
      'Pricing'] = 'East Coast'
df

    Manufacturer        System       Pricing
0   Allen Edmonds       None         NaN
1   Louis Vuitton 23    None         NaN
2   Louis Vuitton 8 14  Platinum     East Coast
3   Gulfstream          Gold         NaN
4   Bombardier          None         NaN
5   23 - Louis Vuitton  Platinum 905 East Coast
6   Louis Vuitton 20    None         NaN

回答2:

def contain(x):
    if 'Louis' in x.Manufacturer and 'Platinum' in x.System:
        return "East Coast" 

df['pricing'] = df.apply(lambda x:contain(x),axis = 1)

来源：https://stackoverflow.com/questions/64839924/how-to-create-a-column-in-a-pandas-dataframe-based-on-a-conditional-substring-se

标签

python

pandas

numpy