NumPy: calculate averages with NaNs removed

前端 未结 12 2425
慢半拍i
慢半拍i 2020-11-27 18:45

How can I calculate matrix mean values along a matrix, but to remove nan values from calculation? (For R people, think na.rm = TRUE).

Here

相关标签:
12条回答
  • 2020-11-27 19:14

    From numpy 1.8 (released 2013-10-30) onwards, nanmean does precisely what you need:

    >>> import numpy as np
    >>> np.nanmean(np.array([1.5, 3.5, np.nan]))
    2.5
    
    0 讨论(0)
  • 2020-11-27 19:16
    # I suggest you this way:
    import numpy as np
    dat  = np.array([[1, 2, 3], [4, 5, np.nan], [np.nan, 6, np.nan], [np.nan, np.nan, np.nan]])
    dat2 = np.ma.masked_invalid(dat)
    print np.mean(dat2, axis=1)   
    
    0 讨论(0)
  • 2020-11-27 19:19

    How about using Pandas to do this:

    import numpy as np
    import pandas as pd
    dat = np.array([[1, 2, 3], [4, 5, np.nan], [np.nan, 6, np.nan], [np.nan, np.nan, np.nan]])
    print dat
    print dat.mean(1)
    
    df = pd.DataFrame(dat)
    print df.mean(axis=1)
    

    Gives:

    0    2.0
    1    4.5
    2    6.0
    3    NaN
    
    0 讨论(0)
  • 2020-11-27 19:22

    Assuming you've also got SciPy installed:

    http://www.scipy.org/doc/api_docs/SciPy.stats.stats.html#nanmean

    0 讨论(0)
  • 2020-11-27 19:22

    One more speed check for all proposed approaches:

    Python 2.7.11 |Anaconda 2.4.1 (64-bit)| (default, Jan 19 2016, 12:08:31) [MSC v.1500 64 bit (AMD64)]
    IPython 4.0.1 -- An enhanced Interactive Python.
    
    import numpy as np
    from scipy.stats.stats import nanmean    
    dat = np.random.normal(size=(1000,1000))
    ii = np.ix_(np.random.randint(0,99,size=50),np.random.randint(0,99,size=50))
    dat[ii] = np.nan
    In[185]: def method1():
        mdat = np.ma.masked_array(dat,np.isnan(dat))
        mm = np.mean(mdat,axis=1)
        mm.filled(np.nan) 
    
    In[190]: %timeit method1()
    100 loops, best of 3: 7.09 ms per loop
    In[191]: %timeit [np.mean([l for l in d if not np.isnan(l)]) for d in dat]
    1 loops, best of 3: 1.04 s per loop
    In[192]: %timeit np.array([r[np.isfinite(r)].mean() for r in dat])
    10 loops, best of 3: 19.6 ms per loop
    In[193]: %timeit np.ma.masked_invalid(dat).mean(axis=1)
    100 loops, best of 3: 11.8 ms per loop
    In[194]: %timeit nanmean(dat,axis=1)
    100 loops, best of 3: 6.36 ms per loop
    In[195]: import bottleneck as bn
    In[196]: %timeit bn.nanmean(dat,axis=1)
    1000 loops, best of 3: 1.05 ms per loop
    In[197]: from scipy import stats
    In[198]: %timeit stats.nanmean(dat)
    100 loops, best of 3: 6.19 ms per loop
    

    So the best is 'bottleneck.nanmean(dat, axis=1)' 'scipy.stats.nanmean(dat)' is not faster then numpy.nanmean(dat, axis=1).

    0 讨论(0)
  • 2020-11-27 19:25
    '''define dataMat'''
    numFeat= shape(datMat)[1]
    for i in range(numFeat):
         meanVal=mean(dataMat[nonzero(~isnan(datMat[:,i].A))[0],i])
    
    0 讨论(0)
提交回复
热议问题