问题
I have a pandas dataframe with 10 keys. If I try to access a column that is not present, even then it returns a NaN for this. I was expecting a KeyError. How is pandas not able to identify the missing column ?
In the example below, vendor_id is a valid column in dataframe. The other column is absent from the dataset.
final_feature.ix[:,['vendor_id','this column is absent']]
Out[1017]:
vendor_id this column is absent
0 434236 NaN
type(final_feature)
Out[1016]: pandas.core.frame.DataFrame
EDIT 1: Validated that no null values are there
print (final_feature1.isnull().values.any())
回答1:
This is expected behaviour and is due to the feature setting with enlargement
In [15]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
df.ix[:,['a','d']]
Out[15]:
a d
0 -1.164349 NaN
1 0.400116 NaN
2 -0.599496 NaN
3 0.186837 NaN
4 0.385656 NaN
If you try df['d']
or df[['a','d']]
then you will get a KeyError
Effectively what you're doing is reindexing, the fact the column doesn't exists when using ix
doesn't matter, you'll just get a column of NaN
s
Same behaviour is observed using loc
:
In [24]:
df.loc[:,['a','d']]
Out[24]:
a d
0 -1.164349 NaN
1 0.400116 NaN
2 -0.599496 NaN
3 0.186837 NaN
4 0.385656 NaN
When you don't use ix
or loc
and try to do df['d']
you're trying to index a specific column or list of columns, there is no expectation of enlargement here unless you are assigning to a new column: e.g. df['d'] = some_new_vals
To guard against this you can validate your list using isin
with the columns:
In [26]:
valid_cols = df.columns.isin(['a','d'])
df.ix[:, valid_cols]
Out[26]:
a
0 -1.164349
1 0.400116
2 -0.599496
3 0.186837
4 0.385656
Now you will only see columns that exist, plus if you have mis-spelt any columns then it will also guard against this
回答2:
For me works select by subset
:
final_feature[['vendor_id','this column is absent']]
KeyError: "['this column is absent'] not in index"
Also ix
is deprecated in last version of pandas (0.20.1
), check here.
来源:https://stackoverflow.com/questions/43911115/python-slicing-does-not-give-key-error-even-when-the-column-is-missing