-9999 as missing value with numpy.genfromtxt()

送分小仙女□ 提交于 2019-12-24 01:17:03

问题


Lets say I have a dumb text file with the contents:

Year    Recon   Observed
1505    162.38        23      
1506     46.14     -9999      
1507    147.49     -9999      

-9999 is used to denote a missing value (don't ask).

So, I should be able to read this into a Numpy array with:

import numpy as np
x = np.genfromtxt("file.txt", dtype = None, names = True, missing_values = -9999)

And have all my little -9999s turn into numpy.nan. But, I get:

>>> x
array([(1409, 112.38, 23), (1410, 56.14, -9999), (1411, 145.49, -9999)], 
  dtype=[('Year', '<i8'), ('Recon', '<f8'), ('Observed', '<i8')])

... That's not right...

Am I missing something?


回答1:


Nope, you're not doing anything wrong. Using the missing_values argument indeed tells np.genfromtxt that the corresponding values should be flagged as "missing/invalid". The problem is that dealing with missing values is only supported if you use the usemask=True argument (I probably should have made that clearer in the documentation, my bad).

With usemask=True, the output is a masked array. You can transform it into a regular ndarray with the missing values replaced by np.nan with the method .filled(np.nan).

Be careful, though: if you have column that was detected as having a int dtype and you try to fill its missing values with np.nan, you won't get what you expect (np.nan is only supported for float columns).




回答2:


Trying:

>>> x = np.genfromtxt("file.txt",names = True, missing_values = "-9999", dtype=None)
>>> x
array([(1505, 162.38, 23), (1506, 46.14, -9999), (1507, 147.49, -9999)], 
      dtype=[('Year', '<i8'), ('Recon', '<f8'), ('Observed', '<i8')])

does not give the correct answer. So just making it a string doesn't help. However, if an additional flag, usemask=True is added, you get:

>>> x = np.genfromtxt("file.txt",names = True, missing_values = -9999, dtype=None, usemask=True)
>>> x
masked_array(data = [(1505, 162.38, 23) (1506, 46.14, --) (1507, 147.49, --)],
             mask = [(False, False, False) (False, False, True) (False, False, True)],
       fill_value = (999999, 1e+20, 999999),
            dtype = [('Year', '<i8'), ('Recon', '<f8'), ('Observed', '<i8')])  

which gives what you want in a MaskedArray, which may be useable for you anyway.




回答3:


The numpy documentation at SciPy suggests that the missing_value should be a string to work the way you want. A straight numeric value seems to be interpreted as a column index.



来源:https://stackoverflow.com/questions/12274709/9999-as-missing-value-with-numpy-genfromtxt

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!