MultiLabelBinarizer
from the sklearn
library is more efficient for these problems. It should be preferred to apply
with pd.Series
. Here's a demo:
import pandas as pd
from sklearn.preprocessing import MultiLabelBinarizer
test = pd.Series([['a', 'b', 'e'], ['c', 'a'], ['d'], ['d'], ['e']])
mlb = MultiLabelBinarizer()
res = pd.DataFrame(mlb.fit_transform(test),
columns=mlb.classes_,
index=test.index)
Result
a b c d e
0 1 1 0 0 1
1 1 0 1 0 0
2 0 0 0 1 0
3 0 0 0 1 0
4 0 0 0 0 1