Say I have a list BRANDS that contains brand names:
BRANDS = [\'Samsung\', \'Apple\', \'Nike\', .....]
Dataframe A has following structure
One approach is to use apply()
:
import pandas as pd
BRANDS = ['Samsung', 'Apple', 'Nike']
def get_brand_name(row):
if ~pd.isnull(row['brand_name']):
# don't do anything if brand_name is not null
return row['brand_name']
item_title = row['item_title']
title_words = map(str.title, item_title.split())
for tw in title_words:
if tw in BRANDS:
# return first 'match'
return tw
# default return None
return None
df['brand_name'] = df.apply(lambda x: get_brand_name(x), axis=1)
print(df)
# row item_title brand_name
#0 1 Apple 6S Apple
#1 2 Nike BB Shoes Nike
#2 3 Samsung TV Samsung
#3 4 Used bike None
Notes
set
instead of a list
because lookups will be faster. However, this won't work if you care about order.