How to drop rows of Pandas DataFrame whose value in a certain column is NaN

前端 未结 12 864
一生所求
一生所求 2020-11-22 00:59

I have this DataFrame and want only the records whose EPS column is not NaN:

>>> df
                 STK_ID           


        
12条回答
  •  自闭症患者
    2020-11-22 01:30

    This question is already resolved, but...

    ...also consider the solution suggested by Wouter in his original comment. The ability to handle missing data, including dropna(), is built into pandas explicitly. Aside from potentially improved performance over doing it manually, these functions also come with a variety of options which may be useful.

    In [24]: df = pd.DataFrame(np.random.randn(10,3))
    
    In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;
    
    In [26]: df
    Out[26]:
              0         1         2
    0       NaN       NaN       NaN
    1  2.677677 -1.466923 -0.750366
    2       NaN  0.798002 -0.906038
    3  0.672201  0.964789       NaN
    4       NaN       NaN  0.050742
    5 -1.250970  0.030561 -2.678622
    6       NaN  1.036043       NaN
    7  0.049896 -0.308003  0.823295
    8       NaN       NaN  0.637482
    9 -0.310130  0.078891       NaN
    

    In [27]: df.dropna()     #drop all rows that have any NaN values
    Out[27]:
              0         1         2
    1  2.677677 -1.466923 -0.750366
    5 -1.250970  0.030561 -2.678622
    7  0.049896 -0.308003  0.823295
    

    In [28]: df.dropna(how='all')     #drop only if ALL columns are NaN
    Out[28]:
              0         1         2
    1  2.677677 -1.466923 -0.750366
    2       NaN  0.798002 -0.906038
    3  0.672201  0.964789       NaN
    4       NaN       NaN  0.050742
    5 -1.250970  0.030561 -2.678622
    6       NaN  1.036043       NaN
    7  0.049896 -0.308003  0.823295
    8       NaN       NaN  0.637482
    9 -0.310130  0.078891       NaN
    

    In [29]: df.dropna(thresh=2)   #Drop row if it does not have at least two values that are **not** NaN
    Out[29]:
              0         1         2
    1  2.677677 -1.466923 -0.750366
    2       NaN  0.798002 -0.906038
    3  0.672201  0.964789       NaN
    5 -1.250970  0.030561 -2.678622
    7  0.049896 -0.308003  0.823295
    9 -0.310130  0.078891       NaN
    

    In [30]: df.dropna(subset=[1])   #Drop only if NaN in specific column (as asked in the question)
    Out[30]:
              0         1         2
    1  2.677677 -1.466923 -0.750366
    2       NaN  0.798002 -0.906038
    3  0.672201  0.964789       NaN
    5 -1.250970  0.030561 -2.678622
    6       NaN  1.036043       NaN
    7  0.049896 -0.308003  0.823295
    9 -0.310130  0.078891       NaN
    

    There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.

    Pretty handy!

提交回复
热议问题