python slicing does not give key error even when the column is missing

空扰寡人 提交于 2019-12-24 11:56:11

问题


I have a pandas dataframe with 10 keys. If I try to access a column that is not present, even then it returns a NaN for this. I was expecting a KeyError. How is pandas not able to identify the missing column ?

In the example below, vendor_id is a valid column in dataframe. The other column is absent from the dataset.

final_feature.ix[:,['vendor_id','this column is absent']]
Out[1017]: 
  vendor_id  this column is absent
0    434236                    NaN

type(final_feature)
Out[1016]: pandas.core.frame.DataFrame

EDIT 1: Validated that no null values are there

print (final_feature1.isnull().values.any())

回答1:


This is expected behaviour and is due to the feature setting with enlargement

In [15]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
df.ix[:,['a','d']]

Out[15]:
          a   d
0 -1.164349 NaN
1  0.400116 NaN
2 -0.599496 NaN
3  0.186837 NaN
4  0.385656 NaN

If you try df['d'] or df[['a','d']] then you will get a KeyError

Effectively what you're doing is reindexing, the fact the column doesn't exists when using ix doesn't matter, you'll just get a column of NaNs

Same behaviour is observed using loc:

In [24]:
df.loc[:,['a','d']]

Out[24]:
          a   d
0 -1.164349 NaN
1  0.400116 NaN
2 -0.599496 NaN
3  0.186837 NaN
4  0.385656 NaN

When you don't use ix or loc and try to do df['d'] you're trying to index a specific column or list of columns, there is no expectation of enlargement here unless you are assigning to a new column: e.g. df['d'] = some_new_vals

To guard against this you can validate your list using isin with the columns:

In [26]:
valid_cols = df.columns.isin(['a','d'])
df.ix[:, valid_cols]

Out[26]:
          a
0 -1.164349
1  0.400116
2 -0.599496
3  0.186837
4  0.385656

Now you will only see columns that exist, plus if you have mis-spelt any columns then it will also guard against this




回答2:


For me works select by subset:

final_feature[['vendor_id','this column is absent']]

KeyError: "['this column is absent'] not in index"

Also ix is deprecated in last version of pandas (0.20.1), check here.



来源:https://stackoverflow.com/questions/43911115/python-slicing-does-not-give-key-error-even-when-the-column-is-missing

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!