Pandas converting row with unix timestamp (in milliseconds) to datetime

后端 未结 4 1767
别那么骄傲
别那么骄傲 2020-12-08 07:24

I need to process a huge amount of CSV files where the time stamp is always a string representing the unix timestamp in milliseconds. I could not find a method yet to modify

相关标签:
4条回答
  • 2020-12-08 07:36

    I came up with a solution I guess:

    convert = lambda x: datetime.datetime.fromtimestamp(float(x) / 1e3)
    
    df = pd.read_csv(StringIO(data), parse_dates=['UNIXTIME'], date_parser=convert)
    

    I'm still not sure if this is the best one though.

    0 讨论(0)
  • 2020-12-08 07:39

    if you know the timestamp unit, use Series.astype:

    df['UNIXTIME'].astype('datetime64[ms]')
    
    0   2015-11-10 13:05:02.320
    1   2015-11-10 13:05:02.364
    2   2015-11-10 13:05:22.364
    Name: UNIXTIME, dtype: datetime64[ns]
    

    To return the entire DataFrame, use

    df.astype({'UNIXTIME': 'datetime64[ms]'})
    
       RUN                UNIXTIME  VALUE
    0    1 2015-11-10 13:05:02.320     10
    1    2 2015-11-10 13:05:02.364     20
    2    3 2015-11-10 13:05:22.364     42
    
    0 讨论(0)
  • 2020-12-08 07:42

    I use the @EdChum solution, but I add the timezone management:

    df['UNIXTIME']=pd.DatetimeIndex(pd.to_datetime(pd['UNIXTIME'], unit='ms'))\
                     .tz_localize('UTC' )\
                     .tz_convert('America/New_York')
    

    the tz_localize indicates that timestamp should be considered as regarding 'UTC', then the tz_convert actually moves the date/time to the correct timezone (in this case `America/New_York').

    Note that it has been converted to a DatetimeIndex because the tz_ methods works only on the index of the series. Since Pandas 0.15 one can use .dt:

    df['UNIXTIME']=pd.to_datetime(pd['UNIXTIME'], unit='ms')\
                     .dt.tz_localize('UTC' )\
                     .dt.tz_convert('America/New_York')
    
    0 讨论(0)
  • 2020-12-08 07:48

    You can do this as a post processing step using to_datetime and passing arg unit='ms':

    In [5]:
    df['UNIXTIME'] = pd.to_datetime(df['UNIXTIME'], unit='ms')
    df
    
    Out[5]:
       RUN                UNIXTIME  VALUE
    0    1 2015-11-10 13:05:02.320     10
    1    2 2015-11-10 13:05:02.364     20
    2    3 2015-11-10 13:05:22.364     42
    
    0 讨论(0)
提交回复
热议问题