I need to process a huge amount of CSV files where the time stamp is always a string representing the unix timestamp in milliseconds. I could not find a method yet to modify
I came up with a solution I guess:
convert = lambda x: datetime.datetime.fromtimestamp(float(x) / 1e3)
df = pd.read_csv(StringIO(data), parse_dates=['UNIXTIME'], date_parser=convert)
I'm still not sure if this is the best one though.
if you know the timestamp unit, use Series.astype
:
df['UNIXTIME'].astype('datetime64[ms]')
0 2015-11-10 13:05:02.320
1 2015-11-10 13:05:02.364
2 2015-11-10 13:05:22.364
Name: UNIXTIME, dtype: datetime64[ns]
To return the entire DataFrame, use
df.astype({'UNIXTIME': 'datetime64[ms]'})
RUN UNIXTIME VALUE
0 1 2015-11-10 13:05:02.320 10
1 2 2015-11-10 13:05:02.364 20
2 3 2015-11-10 13:05:22.364 42
I use the @EdChum solution, but I add the timezone management:
df['UNIXTIME']=pd.DatetimeIndex(pd.to_datetime(pd['UNIXTIME'], unit='ms'))\
.tz_localize('UTC' )\
.tz_convert('America/New_York')
the tz_localize
indicates that timestamp should be considered as regarding 'UTC', then the tz_convert
actually moves the date/time to the correct timezone (in this case `America/New_York').
Note that it has been converted to a DatetimeIndex
because the tz_
methods works only on the index of the series. Since Pandas 0.15 one can use .dt
:
df['UNIXTIME']=pd.to_datetime(pd['UNIXTIME'], unit='ms')\
.dt.tz_localize('UTC' )\
.dt.tz_convert('America/New_York')
You can do this as a post processing step using to_datetime and passing arg unit='ms'
:
In [5]:
df['UNIXTIME'] = pd.to_datetime(df['UNIXTIME'], unit='ms')
df
Out[5]:
RUN UNIXTIME VALUE
0 1 2015-11-10 13:05:02.320 10
1 2 2015-11-10 13:05:02.364 20
2 3 2015-11-10 13:05:22.364 42