Convert pandas DateTimeIndex to Unix Time?

后端 未结 6 1622
故里飘歌
故里飘歌 2020-11-28 06:22

What is the idiomatic way of converting a pandas DateTimeIndex to (an iterable of) Unix Time? This is probably not the way to go:

[time.mktime(t.timetuple()         


        
相关标签:
6条回答
  • 2020-11-28 06:58

    A summary of other answers:

    df['<time_col>'].astype(np.int64) // 10**9
    

    If you want to keep the milliseconds divide by 10**6 instead

    0 讨论(0)
  • 2020-11-28 07:08

    To address the case of NaT, which above solutions will convert to large negative ints, in pandas>=0.24 a possible solution would be:

    def datetime_to_epoch(ser):
        """Don't convert NaT to large negative values."""
        if ser.hasnans:
            res = ser.dropna().astype('int64').astype('Int64').reindex(index=ser.index)
        else:
            res = ser.astype('int64')
    
        return res // 10**9
    

    In the case of missing values this will return the nullable int type 'Int64' (ExtensionType pd.Int64Dtype):

    In [5]: dt = pd.to_datetime(pd.Series(["2019-08-21", "2018-07-28", np.nan]))                                                                                                                                                                                                    
    In [6]: datetime_to_epoch(dt)                                                                                                                                                                                                                                                   
    Out[6]: 
    0    1566345600
    1    1532736000
    2           NaN
    dtype: Int64
    

    Otherwise a regular int64:

    In [7]: datetime_to_epoch(dt[:2])                                                                                                                                                                                                                                               
    Out[7]: 
    0    1566345600
    1    1532736000
    dtype: int64
    
    0 讨论(0)
  • 2020-11-28 07:08

    If you have tried this on the datetime column of your dataframe:

    dframe['datetime'].astype(np.int64) // 10**9
    

    & that you are struggling with the following error:TypeError: int() argument must be a string, a bytes-like object or a number, not 'Timestamp' you can just use these two lines :

    dframe.index = pd.DatetimeIndex(dframe['datetime'])
    dframe['datetime']= dframe.index.astype(np.int64)// 10**9
    
    0 讨论(0)
  • 2020-11-28 07:09

    As DatetimeIndex is ndarray under the hood, you can do the conversion without a comprehension (much faster).

    In [1]: import numpy as np
    
    In [2]: import pandas as pd
    
    In [3]: from datetime import datetime
    
    In [4]: dates = [datetime(2012, 5, 1), datetime(2012, 5, 2), datetime(2012, 5, 3)]
       ...: index = pd.DatetimeIndex(dates)
       ...: 
    In [5]: index.astype(np.int64)
    Out[5]: array([1335830400000000000, 1335916800000000000, 1336003200000000000], 
            dtype=int64)
    
    In [6]: index.astype(np.int64) // 10**9
    Out[6]: array([1335830400, 1335916800, 1336003200], dtype=int64)
    
    %timeit [t.value // 10 ** 9 for t in index]
    10000 loops, best of 3: 119 us per loop
    
    %timeit index.astype(np.int64) // 10**9
    100000 loops, best of 3: 18.4 us per loop
    
    0 讨论(0)
  • 2020-11-28 07:10

    Note: Timestamp is just unix time with nanoseconds (so divide it by 10**9):

    [t.value // 10 ** 9 for t in tsframe.index]
    

    For example:

    In [1]: t = pd.Timestamp('2000-02-11 00:00:00')
    
    In [2]: t
    Out[2]: <Timestamp: 2000-02-11 00:00:00>
    
    In [3]: t.value
    Out[3]: 950227200000000000L
    
    In [4]: time.mktime(t.timetuple())
    Out[4]: 950227200.0
    

    As @root points out it's faster to extract the array of values directly:

    tsframe.index.astype(np.int64) // 10 ** 9
    
    0 讨论(0)
  • 2020-11-28 07:13

    Complementing the other answers: //10**9 will do a flooring divide, which gives full past seconds rather than the nearest value in seconds. A simple way to get more reasonable rounding, if that is desired, is to add 5*10**8 - 1 before doing the flooring divide.

    0 讨论(0)
提交回复
热议问题