问题
I do the following operations:
- Convert string datetime in pandas dataframe to python datetime via
apply(strptime)
- Convert
datetime
to posix timestamp via.timestamp()
method - If I revert posix back to
datetime
with.fromtimestamp()
I obtain different datetime
It differs by 3 hours which is my timezone (I'm at UTC+3 now), so I suppose it is a kind of timezone issue. Also I understand that in apply it implicitly converts to pandas.Timestamp
, but I don't understand the difference in this case.
What is the reason for such strange behavior and what should I do to avoid it? Actually in my project I need to compare this pandas timestamps with correct poxis timestamps and now it works wrong.
Below is dummy reproducible example:
df = pd.DataFrame(['2018-03-03 14:30:00'], columns=['c'])
df['c'] = df['c'].apply(lambda x: datetime.datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))
dt = df['c'].iloc[0]
dt
>> Timestamp('2018-03-03 14:30:00')
datetime.datetime.fromtimestamp(dt.timestamp())
>> datetime.datetime(2018, 3, 3, 17, 30)
回答1:
First, I suggest using the np.timedelta64
dtype when working with pandas
. In this case it makes the reciprocity simple.
pd.to_datetime('2018-03-03 14:30:00').value
#1520087400000000000
pd.to_datetime(pd.to_datetime('2018-03-03 14:30:00').value)
#Timestamp('2018-03-03 14:30:00')
The issue with the other methods is that POSIX has UTC as the origin, but fromtimestamp
returns the local time. If your system isn't UTC compliant, then we get issues. The following methods will work to remedy this:
from datetime import datetime
import pytz
dt
#Timestamp('2018-03-03 14:30:00')
# Seemingly problematic:
datetime.fromtimestamp(dt.timestamp())
#datetime.datetime(2018, 3, 3, 9, 30)
datetime.fromtimestamp(dt.timestamp(), tz=pytz.utc)
#datetime.datetime(2018, 3, 3, 14, 30, tzinfo=<UTC>)
datetime.combine(dt.date(), dt.timetz())
#datetime.datetime(2018, 3, 3, 14, 30)
mytz = pytz.timezone('US/Eastern') # Use your own local timezone
datetime.fromtimestamp(mytz.localize(dt).timestamp())
#datetime.datetime(2018, 3, 3, 14, 30)
回答2:
An answer with the to_datetime function:
df = pd.DataFrame(['2018-03-03 14:30:00'], columns=['c'])
df['c'] = pd.to_datetime(df['c'].values, dayfirst=False).tz_localize('Your/Timezone')
When working with date, you should always put a timezone it is easier after to work with.
It does not explain the difference between the datetime
in pandas and alone.
来源:https://stackoverflow.com/questions/57465747/strange-behavior-with-pandas-timestamp-to-posix-conversion