Truncate `TimeStamp` column to hour precision in pandas `DataFrame`

后端 未结 2 1330
轻奢々
轻奢々 2020-12-02 12:58

I have a pandas.DataFrame called df which has an automatically generated index, with a column dt:

df[\'dt\'].dtype, df         


        
相关标签:
2条回答
  • 2020-12-02 13:40

    In pandas 0.18.0 and later, there are datetime floor, ceil and round methods to round timestamps to a given fixed precision/frequency. To round down to hour precision, you can use:

    >>> df['dt2'] = df['dt'].dt.floor('h')
    >>> df
                          dt                     dt2
    0    2014-10-01 10:02:45     2014-10-01 10:00:00
    1    2014-10-01 13:08:17     2014-10-01 13:00:00
    2    2014-10-01 17:39:24     2014-10-01 17:00:00
    

    Here's another alternative to truncate the timestamps. Unlike floor, it supports truncating to a precision such as year or month.

    You can temporarily adjust the precision unit of the underlying NumPy datetime64 datatype, changing it from [ns] to [h]:

    df['dt'].values.astype('<M8[h]')
    

    This truncates everything to hour precision. For example:

    >>> df
                           dt
    0     2014-10-01 10:02:45
    1     2014-10-01 13:08:17
    2     2014-10-01 17:39:24
    
    >>> df['dt2'] = df['dt'].values.astype('<M8[h]')
    >>> df
                          dt                     dt2
    0    2014-10-01 10:02:45     2014-10-01 10:00:00
    1    2014-10-01 13:08:17     2014-10-01 13:00:00
    2    2014-10-01 17:39:24     2014-10-01 17:00:00
    
    >>> df.dtypes
    dt     datetime64[ns]
    dt2    datetime64[ns]
    

    The same method should work for any other unit: months 'M', minutes 'm', and so on:

    • Keep up to year: '<M8[Y]'
    • Keep up to month: '<M8[M]'
    • Keep up to day: '<M8[D]'
    • Keep up to minute: '<M8[m]'
    • Keep up to second: '<M8[s]'
    0 讨论(0)
  • 2020-12-02 13:52

    A method I've used in the past to accomplish this goal was the following (quite similar to what you're already doing, but thought I'd throw it out there anyway):

    df['dt2'] = df['dt'].apply(lambda x: x.replace(minute=0, second=0))
    
    0 讨论(0)
提交回复
热议问题