How to elegantly one hot encode a series of lists in pandas [duplicate]

前端未结

关注

 1  1578

误落风尘

相关标签:

1条回答

天涯浪人

2021-02-09 02:27

MultiLabelBinarizer from the sklearn library is more efficient for these problems. It should be preferred to apply with pd.Series. Here's a demo:

import pandas as pd
from sklearn.preprocessing import MultiLabelBinarizer

test = pd.Series([['a', 'b', 'e'], ['c', 'a'], ['d'], ['d'], ['e']])

mlb = MultiLabelBinarizer()

res = pd.DataFrame(mlb.fit_transform(test),
                   columns=mlb.classes_,
                   index=test.index)

Result

   a  b  c  d  e
0  1  1  0  0  1
1  1  0  1  0  0
2  0  0  0  1  0
3  0  0  0  1  0
4  0  0  0  0  1

0 讨论(0)

热议问题