Finding bogus data in a pandas dataframe read with read_fwf()

一笑奈何 提交于 2019-12-05 08:41:33

First, let's mock up some data:

import numpy as np
import pandas

df = pandas.DataFrame(
    np.random.normal(size=(5,5)), 
    index='rA,rB,rC,rD,rE'.split(','),
    columns='cA,cB,cC,cD,cE'.split(',')
)
df[df > 1] = np.inf
df

That, for examples, should give something like this:

          cA        cB        cC        cD        cE
rA -1.202383 -0.625521       inf -0.888086 -0.215671
rB  0.537521 -1.149731  0.841687  0.190505       inf
rC -1.447124 -0.607486 -1.268923       inf  0.438190
rD -0.275085  0.793483  0.276376 -0.095727 -0.050957
rE -0.095414  0.048926  0.591899  0.298865 -0.308620

So now I can use fancy indexing to isolate all the infs.

print(df[np.isinf(df)].to_string())

    cA  cB   cC   cD   cE
rA NaN NaN  inf  NaN  NaN
rB NaN NaN  NaN  NaN  inf
rC NaN NaN  NaN  inf  NaN
rD NaN NaN  NaN  NaN  NaN
rE NaN NaN  NaN  NaN  NaN

But that's not really useful. So on top of finding the infs, we should stack the column index into the rows (unpivot, if you will) then drop all the NaN values. This will give us a nice summary of the rows/columns with infs.

df[np.isinf(df)].stack().dropna()

rA  cC    inf
rB  cE    inf
rC  cD    inf
dtype: float64

np.isinf will fail if you have object dtypes in you dataframe. To overcome this:

with pd.option_context('mode.use_inf_as_null', True):
    is_bad_data = df.isnull()
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!