I have a database of mostly correct datetimes but a few are broke like so: Sat Dec 22 12:34:08 PST 20102015
Without the invalid year, this was working f
Improving (i hope) the code of Adam Rosenfield:
import time
for end_date in ( 'Fri Feb 18 20:41:47 Paris, Madrid 2011',
'Fri Feb 18 20:41:47 Paris, Madrid 20112015'):
print end_date
fmt = "%a %b %d %H:%M:%S %Z %Y"
try:
end_date = time.strptime(end_date, fmt)
except ValueError, v:
ulr = len(v.args[0].partition('unconverted data remains: ')[2])
if ulr:
end_date = time.strptime(end_date[:-ulr], fmt)
else:
raise v
print end_date,'\n'
Unless you want to rewrite strptime
(a very bad idea), the only real option you have is to slice end_date
and chop off the extra characters at the end, assuming that this will give you the correct result you intend.
For example, you can catch the ValueError
, slice, and try again:
def parse_prefix(line, fmt):
try:
t = time.strptime(line, fmt)
except ValueError as v:
if len(v.args) > 0 and v.args[0].startswith('unconverted data remains: '):
line = line[:-(len(v.args[0]) - 26)]
t = time.strptime(line, fmt)
else:
raise
return t
For example:
parse_prefix(
'2015-10-15 11:33:20.738 45162 INFO core.api.wsgi yadda yadda.',
'%Y-%m-%d %H:%M:%S'
) # -> time.struct_time(tm_year=2015, tm_mon=10, tm_mday=15, tm_hour=11, tm_min=33, ...
strptime()
really expects to see a correctly formatted date, so you probably need to do some munging on the end_date
string before you call it.
This is one way to chop the last item in the end_date
to 4 chars:
chop = len(end_date.split()[-1]) - 4
end_date = end_date[:-chop]
Yeah, I'd just chop off the extra numbers. Assuming they are always appended to the datestring, then something like this would work:
end_date = end_date.split(" ")
end_date[-1] = end_date[-1][:4]
end_date = " ".join(end_date)
I was going to try to get the number of excess digits from the exception, but on my installed versions of Python (2.6.6 and 3.1.2) that information isn't actually there; it just says that the data does not match the format. Of course, you could just continue lopping off digits one at a time and re-parsing until you don't get an exception.
You could also write a regex that will match only valid dates, including the right number of digits in the year, but that seems like overkill.
Here's an even simpler one-liner I use:
end_date = end_date[:-4]