Querying for NaN and other names in Pandas

后端 未结 7 1557
生来不讨喜
生来不讨喜 2020-12-12 20:50

Say I have a dataframe df with a column value holding some float values and some NaN. How can I get the part of the dataframe where we

相关标签:
7条回答
  • 2020-12-12 21:01

    For rows where value is not null

    df.query("value == value")
    

    For rows where value is null

    df.query("value != value")
    
    0 讨论(0)
  • 2020-12-12 21:05

    Pandas fills empty cells in a DataFrame with NumPy's nan values. As it turns out, this has some funny properties. For one, nothing is equal to this kind of null, even itself. As a result, you can't search for it by checking for any particular equality.

    In : 'nan' == np.nan
    Out: False
    
    In : None == np.nan
    Out: False
    
    In : np.nan == np.nan
    Out: False
    

    However, because a cell containing a np.nan value will not be equal to anything, including another np.nan value, we can check to see if it is unequal to itself.

    In : np.nan != np.nan
    Out: True
    

    You can take advantage of this using Pandas query method by simply searching for cells where the value in a particular column is unequal to itself.

    df.query('a != a')
    
    0 讨论(0)
  • 2020-12-12 21:07

    I think other answers will normally be better. In one case, my query had to go through eval (use eval very carefully) and the syntax below was useful. Requiring a number to be both less than and greater than or equal to excludes all numbers, leaving only null-like values.

    df = pd.DataFrame({'value':[3,4,9,10,11,np.nan, 12]})
    
    df.query("value < 10 or (~(value < 10) and ~(value >= 10))")
    
    0 讨论(0)
  • 2020-12-12 21:08

    According to this answer you can use:

    df.query('value < 10 | value.isnull()', engine='python')
    

    I verified that it works.

    0 讨论(0)
  • 2020-12-12 21:12

    In general, you could use @local_variable_name, so something like

    >>> pi = np.pi; nan = np.nan
    >>> df = pd.DataFrame({"value": [3,4,9,10,11,np.nan,12]})
    >>> df.query("(value < 10) and (value > @pi)")
       value
    1      4
    2      9
    

    would work, but nan isn't equal to itself, so value == NaN will always be false. One way to hack around this is to use that fact, and use value != value as an isnan check. We have

    >>> df.query("(value < 10) or (value == @nan)")
       value
    0      3
    1      4
    2      9
    

    but

    >>> df.query("(value < 10) or (value != value)")
       value
    0      3
    1      4
    2      9
    5    NaN
    
    0 讨论(0)
  • 2020-12-12 21:13

    This should also work: df.query("value == 'NaN'")

    0 讨论(0)
提交回复
热议问题