I\'ve a csv file like this:
Fruit_Type;Fruit_Color;Fruit_Description
Apple;Green,Red,Yellow;Just an apple
Banana;Green,Yellow;Just a Banana
Orange;Red,Yellow;Jus
I suggest use str.get_dummies:
df = df.join(df.pop('Fruit_Color').str.get_dummies(','))
print (df)
Fruit_Type Fruit_Description Green Red Yellow
0 Apple Just an apple 1 1 1
1 Banana Just a Banana 1 0 1
2 Orange Just an Orange 0 1 1
3 Grape Just a Grape 0 0 0
You can create the columns using assign
:
df.assign(
green=lambda d: d['Fruit_color'].str.contains('Green', case=True),
red=lambda d: d['Fruit_color'].str.contains('Red', case=True),
yellow=lambda d: d['Fruit_color'].str.contains('Yellow', case=True),
)
This results in a new dataframe with three additional columns of Booleans, namely "green", "red" and "yellow".
To detect a row with no known colour, you can also assign other_color=lambda d: ~(d['green'] | d['red'] | d['yellow'])
.
Another possibility is to use pandas.concat
to concatenate multiple dataframes, but it's less elegant than the above solution.