Filtering all rows with NaT in a column in Dataframe python

后端 未结 4 1382
有刺的猬
有刺的猬 2020-12-28 13:01

I have a df like this:

    a b           c
    1 NaT         w
    2 2014-02-01  g
    3 NaT         x   

    df=df[df.b==\'2014-02-01\']

相关标签:
4条回答
  • 2020-12-28 13:21

    Using your example dataframe:

    df = pd.DataFrame({"a":[1,2,3], 
                       "b":[pd.NaT, pd.to_datetime("2014-02-01"), pd.NaT], 
                       "c":["w", "g", "x"]})
    

    Until v0.17 this didn't use to work:

    df.query('b != b') 
    

    and you had to do:

    df.query('b == "NaT"') # yes, surprisingly, this works!
    

    Since v0.17 though, both methods work, although I would only recommend the first one.

    0 讨论(0)
  • 2020-12-28 13:25

    isnull and notnull work with NaT so you can handle them much the same way you handle NaNs:

    >>> df
    
       a          b  c
    0  1        NaT  w
    1  2 2014-02-01  g
    2  3        NaT  x
    
    >>> df.dtypes
    
    a             int64
    b    datetime64[ns]
    c            object
    

    just use isnull to select:

    df[df.b.isnull()]
    
       a   b  c
    0  1 NaT  w
    2  3 NaT  x
    
    0 讨论(0)
  • 2020-12-28 13:26

    I feel that the comment by @DSM is worth a answer on its own, because this answers the fundamental question.

    The misunderstanding comes from the assumption that pd.NaT acts like None. However, while None == None returns True, pd.NaT == pd.NaT returns False. Pandas NaT behaves like a floating-point NaN, which is not equal to itself.

    As the previous answer explain, you should use

    df[df.b.isnull()] # or notnull(), respectively
    
    0 讨论(0)
  • 2020-12-28 13:41

    For those interested, in my case I wanted to drop the NaT contained in the DateTimeIndex of a dataframe. I could not directly use the notnull construction as suggested by Karl D. You first have to create a temporary column out of the index, then apply the mask, and then delete the temporary column again.

    df["TMP"] = df.index.values                # index is a DateTimeIndex
    df = df[df.TMP.notnull()]                  # remove all NaT values
    df.drop(["TMP"], axis=1, inplace=True)     # delete TMP again
    
    0 讨论(0)
提交回复
热议问题