How to remove unconverted data from a Python datetime object

后端 未结 5 1973
别那么骄傲
别那么骄傲 2020-12-09 15:38

I have a database of mostly correct datetimes but a few are broke like so: Sat Dec 22 12:34:08 PST 20102015

Without the invalid year, this was working f

相关标签:
5条回答
  • 2020-12-09 16:16

    Improving (i hope) the code of Adam Rosenfield:

    import time
    
    for end_date in ( 'Fri Feb 18 20:41:47 Paris, Madrid 2011',
                      'Fri Feb 18 20:41:47 Paris, Madrid 20112015'):
    
        print end_date
    
        fmt = "%a %b %d %H:%M:%S %Z %Y"
        try:
            end_date = time.strptime(end_date, fmt)
        except ValueError, v:
            ulr = len(v.args[0].partition('unconverted data remains: ')[2])
            if ulr:
                end_date = time.strptime(end_date[:-ulr], fmt)
            else:
                raise v
    
        print end_date,'\n'
    
    0 讨论(0)
  • 2020-12-09 16:17

    Unless you want to rewrite strptime (a very bad idea), the only real option you have is to slice end_date and chop off the extra characters at the end, assuming that this will give you the correct result you intend.

    For example, you can catch the ValueError, slice, and try again:

    def parse_prefix(line, fmt):
        try:
            t = time.strptime(line, fmt)
        except ValueError as v:
            if len(v.args) > 0 and v.args[0].startswith('unconverted data remains: '):
                line = line[:-(len(v.args[0]) - 26)]
                t = time.strptime(line, fmt)
            else:
                raise
        return t
    

    For example:

    parse_prefix(
        '2015-10-15 11:33:20.738 45162 INFO core.api.wsgi yadda yadda.',
        '%Y-%m-%d %H:%M:%S'
    ) # -> time.struct_time(tm_year=2015, tm_mon=10, tm_mday=15, tm_hour=11, tm_min=33, ...
    
    0 讨论(0)
  • 2020-12-09 16:22

    strptime() really expects to see a correctly formatted date, so you probably need to do some munging on the end_date string before you call it.

    This is one way to chop the last item in the end_date to 4 chars:

    chop = len(end_date.split()[-1]) - 4
    end_date = end_date[:-chop]
    
    0 讨论(0)
  • 2020-12-09 16:31

    Yeah, I'd just chop off the extra numbers. Assuming they are always appended to the datestring, then something like this would work:

    end_date = end_date.split(" ")
    end_date[-1] = end_date[-1][:4]
    end_date = " ".join(end_date)
    

    I was going to try to get the number of excess digits from the exception, but on my installed versions of Python (2.6.6 and 3.1.2) that information isn't actually there; it just says that the data does not match the format. Of course, you could just continue lopping off digits one at a time and re-parsing until you don't get an exception.

    You could also write a regex that will match only valid dates, including the right number of digits in the year, but that seems like overkill.

    0 讨论(0)
  • 2020-12-09 16:34

    Here's an even simpler one-liner I use:

    end_date = end_date[:-4]

    0 讨论(0)
提交回复
热议问题