I am trying to retrieve date from an email. At first it's easy:
message = email.parser.Parser().parse(file)
date = message['Date']
print date
and I receive:
'Mon, 16 Nov 2009 13:32:02 +0100'
But I need a nice datetime object, so I use:
datetime.strptime('Mon, 16 Nov 2009 13:32:02 +0100', '%a, %d %b %Y %H:%M:%S %Z')
which raises ValueError, since %Z isn't format for +0100
. But I can't find proper format for timezone in the documentation, there is only this %Z
for zone. Can someone help me on that?
email.utils
has a parsedate()
function for the RFC 2822 format, which as far as I know is not deprecated.
>>> import email.utils
>>> import time
>>> import datetime
>>> email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0100')
(2009, 11, 16, 13, 32, 2, 0, 1, -1)
>>> time.mktime((2009, 11, 16, 13, 32, 2, 0, 1, -1))
1258378322.0
>>> datetime.datetime.fromtimestamp(1258378322.0)
datetime.datetime(2009, 11, 16, 13, 32, 2)
Please note, however, that the parsedate
method does not take into account the time zone and time.mktime
always expects a local time tuple as mentioned here.
>>> (time.mktime(email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0900')) ==
... time.mktime(email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0100'))
True
So you'll still need to parse out the time zone and take into account the local time difference, too:
>>> REMOTE_TIME_ZONE_OFFSET = +9 * 60 * 60
>>> (time.mktime(email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0900')) +
... time.timezone - REMOTE_TIME_ZONE_OFFSET)
1258410122.0
Use email.utils.parsedate_tz(date)
:
msg=email.message_from_file(open(file_name))
date=None
date_str=msg.get('date')
if date_str:
date_tuple=email.utils.parsedate_tz(date_str)
if date_tuple:
date=datetime.datetime.fromtimestamp(email.utils.mktime_tz(date_tuple))
if date:
... # valid date found
In Python 3.3+, email
message can parse the headers for you:
import email
import email.policy
headers = email.message_from_file(file, policy=email.policy.default)
print(headers.get('date').datetime)
# -> 2009-11-16 13:32:02+01:00
Since Python 3.2+, it works if you replace %Z
with %z
:
>>> from datetime import datetime
>>> datetime.strptime("Mon, 16 Nov 2009 13:32:02 +0100",
... "%a, %d %b %Y %H:%M:%S %z")
datetime.datetime(2009, 11, 16, 13, 32, 2,
tzinfo=datetime.timezone(datetime.timedelta(0, 3600)))
Or using email
package (Python 3.3+):
>>> from email.utils import parsedate_to_datetime
>>> parsedate_to_datetime("Mon, 16 Nov 2009 13:32:02 +0100")
datetime.datetime(2009, 11, 16, 13, 32, 2,
tzinfo=datetime.timezone(datetime.timedelta(0, 3600)))
if UTC offset is specified as -0000
then it returns a naive datetime object that represents time in UTC otherwise it returns an aware datetime object with the corresponding tzinfo
set.
To parse rfc 5322 date-time string on earlier Python versions (2.6+):
from calendar import timegm
from datetime import datetime, timedelta, tzinfo
from email.utils import parsedate_tz
ZERO = timedelta(0)
time_string = 'Mon, 16 Nov 2009 13:32:02 +0100'
tt = parsedate_tz(time_string)
#NOTE: mktime_tz is broken on Python < 2.7.4,
# see https://bugs.python.org/issue21267
timestamp = timegm(tt) - tt[9] # local time - utc offset == utc time
naive_utc_dt = datetime(1970, 1, 1) + timedelta(seconds=timestamp)
aware_utc_dt = naive_utc_dt.replace(tzinfo=FixedOffset(ZERO, 'UTC'))
aware_dt = aware_utc_dt.astimezone(FixedOffset(timedelta(seconds=tt[9])))
print(aware_utc_dt)
print(aware_dt)
# -> 2009-11-16 12:32:02+00:00
# -> 2009-11-16 13:32:02+01:00
where FixedOffset
is based on tzinfo
subclass from the datetime
documentation:
class FixedOffset(tzinfo):
"""Fixed UTC offset: `time = utc_time + utc_offset`."""
def __init__(self, offset, name=None):
self.__offset = offset
if name is None:
seconds = abs(offset).seconds
assert abs(offset).days == 0
hours, seconds = divmod(seconds, 3600)
if offset < ZERO:
hours = -hours
minutes, seconds = divmod(seconds, 60)
assert seconds == 0
#NOTE: the last part is to remind about deprecated POSIX
# GMT+h timezones that have the opposite sign in the
# name; the corresponding numeric value is not used e.g.,
# no minutes
self.__name = '<%+03d%02d>GMT%+d' % (hours, minutes, -hours)
else:
self.__name = name
def utcoffset(self, dt=None):
return self.__offset
def tzname(self, dt=None):
return self.__name
def dst(self, dt=None):
return ZERO
def __repr__(self):
return 'FixedOffset(%r, %r)' % (self.utcoffset(), self.tzname())
For python 3.3+ you can use parsedate_to_datetime function:
>>> from email.utils import parsedate_to_datetime
>>> parsedate_to_datetime('Mon, 16 Nov 2009 13:32:02 +0100')
...
datetime.datetime(2009, 11, 16, 13, 32, 2, tzinfo=datetime.timezone(datetime.timedelta(0, 3600)))
Official documentation:
The inverse of format_datetime(). Performs the same function as parsedate(), but on success returns a datetime. If the input date has a timezone of -0000, the datetime will be a naive datetime, and if the date is conforming to the RFCs it will represent a time in UTC but with no indication of the actual source timezone of the message the date comes from. If the input date has any other valid timezone offset, the datetime will be an aware datetime with the corresponding a timezone tzinfo. New in version 3.3.
Have you tried
rfc822.parsedate_tz(date) # ?
More on RFC822, http://docs.python.org/library/rfc822.html
It's deprecated (parsedate_tz is now in email.utils.parsedate_tz
), though.
But maybe these answers help:
# Parses Nginx' format of "01/Jan/1999:13:59:59 +0400"
# Unfortunately, strptime doesn't support %z for the UTC offset (despite what
# the docs actually say), hence the need # for this function.
def parseDate(dateStr):
date = datetime.datetime.strptime(dateStr[:-6], "%d/%b/%Y:%H:%M:%S")
offsetDir = dateStr[-5]
offsetHours = int(dateStr[-4:-2])
offsetMins = int(dateStr[-2:])
if offsetDir == "-":
offsetHours = -offsetHours
offsetMins = -offsetMins
return date + datetime.timedelta(hours=offsetHours, minutes=offsetMins)
For those who want to get the correct local time, here is what I did:
from datetime import datetime
from email.utils import parsedate_to_datetime
mail_time_str = 'Mon, 16 Nov 2009 13:32:02 +0100'
local_time_str = datetime.fromtimestamp(parsedate_to_datetime(mail_time_str).timestamp()).strftime('%Y-%m-%d %H:%M:%S')
print(local_time_str)
ValueError: 'z' is a bad directive in format...
(note: I have to stick to python 2.7 in my case)
I have had a similar problem parsing commit dates from the output of git log --date=iso8601
which actually isn't the ISO8601 format (hence the addition of --date=iso8601-strict
in a later version).
Since I am using django
I can leverage the utilities there.
https://github.com/django/django/blob/master/django/utils/dateparse.py
>>> from django.utils.dateparse import parse_datetime
>>> parse_datetime('2013-07-23T15:10:59.342107+01:00')
datetime.datetime(2013, 7, 23, 15, 10, 59, 342107, tzinfo=+0100)
Instead of strptime
you could use your own regular expression.
来源:https://stackoverflow.com/questions/1790795/parsing-date-with-timezone-from-an-email