问题
I have a problem running the code below.
data is my dataframe. X is the list of columns for train data. And L is a list of categorical features with numeric values.
I want to one hot encode my categorical features. So I do as follows. But a "ValueError: Columns must be same length as key" (for the last line) is thrown. And I still don't understand why after long research.
def turn_dummy(df, prop):
dummies = pd.get_dummies(df[prop], prefix=prop, sparse=True)
df.drop(prop, axis=1, inplace=True)
return pd.concat([df, dummies], axis=1)
L = ['A', 'B', 'C']
for col in L:
data_final[X] = turn_dummy(data_final[X], col)
回答1:
It appears that this is a problem of dimensionality. It would be like the following:
Say I have a list
like so:
mylist = [0, 0, 0, 0]
It is of length 4. If I wanted to do 1:1 mapping of elements of a new list into that one:
otherlist = ['a', 'b']
for i in range(len(mylist)):
mylist[i] = otherlist[i]
Obviously this will throw an IndexError
, because it's trying to get elements that otherlist
just doesn't have
Much the same is occurring here. You are trying to insert a string
(len=1) to a column of length n>1. Try:
data_final[X] = turn_dummy(data_final[X], L)
Assuming len(L) = number_of_rows
回答2:
No reason to create your own function. Pandas has a function to do what you want already:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.get_dummies.html
来源:https://stackoverflow.com/questions/52428968/valueerror-columns-must-be-same-length-as-key