问题
I am working with a dataframe containing two columns with ID numbers. For further research I want to make a sort of dummy variables of these ID numbers (with the two ID numbers). My code, however, does not merge the columns from the two dataframes. How can I merge the columns from the two dataframes and create the dummy variables?
Dataframe
import pandas as pd
import numpy as np
d = {'ID1': [1,2,3], 'ID2': [2,3,4]}
df = pd.DataFrame(data=d)
Current code
pd.get_dummies(df, prefix = ['ID1', 'ID2'], columns=['ID1', 'ID2'])
Desired output
p = {'1': [1,0,0], '2': [1,1,0], '3': [0,1,1], '4': [0,0,1]}
df2 = pd.DataFrame(data=p)
df2
回答1:
If need indicators in output use max
, if need count values use sum
after get_dummies with another parameters and casting values to strings:
df = pd.get_dummies(df.astype(str), prefix='', prefix_sep='').max(level=0, axis=1)
#count alternative
#df = pd.get_dummies(df.astype(str), prefix='', prefix_sep='').sum(level=0, axis=1)
print (df)
1 2 3 4
0 1 1 0 0
1 0 1 1 0
2 0 0 1 1
回答2:
Different ways of skinning a cat; here's how I'd do it—use an additional groupby
:
# pd.get_dummies(df.astype(str)).groupby(lambda x: x.split('_')[1], axis=1).sum()
pd.get_dummies(df.astype(str)).groupby(lambda x: x.split('_')[1], axis=1).max()
1 2 3 4
0 1 1 0 0
1 0 1 1 0
2 0 0 1 1
Another option is stack
ing, if you like conciseness:
# pd.get_dummies(df.stack()).sum(level=0)
pd.get_dummies(df.stack()).max(level=0)
1 2 3 4
0 1 1 0 0
1 0 1 1 0
2 0 0 1 1
来源:https://stackoverflow.com/questions/55182909/create-dummy-variable-of-multiple-columns-with-python