问题
I have 2 columns:
- Sex (with categorical values of type string as 'male' and 'female')
- Class (with categorical values of type integer as 1 to 10)
When I execute pd.get_dummies()
on the above 2 columns, only 'Sex' is getting encoded into 2 columns. But 'Class' is not converted by get_dummies function.
I want 'Class' to be converted into 10 dummy columns as well, similar to One Hot Encoding.
Is this expected behavior? Is there an workaround?
回答1:
You can convert values to strings:
df1 = pd.get_dummies(df.astype(str))
回答2:
If you don't want to convert your data, you can use 'columns' argument in get_dummies. Here is quick walkthrough:
Here is the data frame reproduced per your description:
sex_labels = ['male', 'female']
sex_col = [sex_labels[i%2] for i in range(10)]
class_col = [i for i in range(10)]
df = pd.DataFrame({'sex':sex_cols, 'class':class_col})
df.sex = pd.Categorical(df.sex)
The dtypes are:
print(df.dtypes)
sex category
class int64
dtype: object
Apply get_dummies:
df = pd.get_dummies(df, columns=['sex', 'class'])
Verify:
print(df.columns)
Output:
Index(['sex_female', 'sex_male', 'class_0',
'class_1','class_2','class_3','class_4','class_5',
'class_6','class_7','class_8','class_9'],dtype='object')
Per the docs at, https://pandas.pydata.org/pandasdocs/stable/reference/api/pandas.get_dummies.html,
If columns is None then all the columns with object or category dtype will be converted
This is the reason you only see dummies for sex column and not for class.
Hope this helps. Happy learning!
Note: Tested with pandas version '0.25.2'
来源:https://stackoverflow.com/questions/54569115/pandas-get-dummies-for-numeric-categorical-data