Parsing a date in python without using a default

后端 未结 4 557
我寻月下人不归
我寻月下人不归 2021-01-04 05:51

I\'m using python\'s dateutil.parser tool to parse some dates I\'m getting from a third party feed. It allows specifying a default date, which itself defaults

相关标签:
4条回答
  • 2021-01-04 06:27

    simple-date does this for you (it does try multiple formats, internally, but not as many as you might think, because the patterns it uses extend python's date patterns with optional parts, like regexps).

    see https://github.com/andrewcooke/simple-date - but only python 3.2 and up (sorry).

    it's more lenient than what you want by default:

    >>> for date in ('2011-10-12', '2011-10', '2011', '10-12', '2011-10-12T11:45:30', '10-12 11:45', ''):
    ...   print(date)
    ...   try: print(SimpleDate(date).naive.datetime)
    ...   except: print('nope')
    ... 
    2011-10-12
    2011-10-12 00:00:00
    2011-10
    2011-10-01 00:00:00
    2011
    2011-01-01 00:00:00
    10-12
    nope
    2011-10-12T11:45:30
    2011-10-12 11:45:30
    10-12 11:45
    nope
    
    nope
    

    but you could specify your own format. for example:

    >>> from simpledate import SimpleDateParser, invert
    >>> parser = SimpleDateParser(invert('Y-m-d(%T| )?(H:M(:S)?)?'))
    >>> for date in ('2011-10-12', '2011-10', '2011', '10-12', '2011-10-12T11:45:30', '10-12 11:45', ''):
    ...   print(date)
    ...   try: print(SimpleDate(date, date_parser=parser).naive.datetime)
    ...   except: print('nope')
    ... 
    2011-10-12
    2011-10-12 00:00:00
    2011-10
    nope
    2011
    nope
    10-12
    nope
    2011-10-12T11:45:30
    2011-10-12 11:45:30
    10-12 11:45
    nope
    
    nope
    

    ps the invert() just switches the presence of % which otherwise become a real mess when specifying complex date patterns. so here only the literal T character needs a % prefix (in standard python date formatting it would be the only alpha-numeric character without a prefix)

    0 讨论(0)
  • 2021-01-04 06:31

    I ran into the exact same problem with dateutil, I wrote this function and figured I would post it for posterity's sake. Basically using the underlying _parse method like @ILYA Khlopotov suggests:

    from dateutil.parser import parser
    import datetime
    from StringIO import StringIO
    
    _CURRENT_YEAR = datetime.datetime.now().year
    def is_good_date(date):
        try:
            parsed_date = parser._parse(parser(), StringIO(date))
        except:
            return None
        if not parsed_date: return None
        if not parsed_date.year: return None
        if parsed_date.year < 1890 or parsed_date.year > _CURRENT_YEAR: return None
        if not parsed_date.month: return None
        if parsed_date.month < 1 or parsed_date.month > 12: return None
        if not parsed_date.day: return None
        if parsed_date.day < 1 or parsed_date.day > 31: return None
        return parsed_date
    

    The returned object isn't adatetime instance, but it has the .year, .month, and, .day attributes, which was good enough for my needs. I suppose you could easily convert it to a datetime instance.

    0 讨论(0)
  • 2021-01-04 06:44

    This is probably a "hack", but it looks like dateutil looks at very few attributes out of the default you pass in. You could provide a 'fake' datetime that explodes in the desired way.

    >>> import datetime
    >>> import dateutil.parser
    >>> class NoDefaultDate(object):
    ...     def replace(self, **fields):
    ...         if any(f not in fields for f in ('year', 'month', 'day')):
    ...             return None
    ...         return datetime.datetime(2000, 1, 1).replace(**fields)
    >>> def wrap_parse(v):
    ...     _actual = dateutil.parser.parse(v, default=NoDefaultDate())
    ...     return _actual.date() if _actual is not None else None
    >>> cases = (
    ...   ('2011-10-12', datetime.date(2011, 10, 12)),
    ...   ('2011-10', None),
    ...   ('2011', None),
    ...   ('10-12', None),
    ...   ('2011-10-12T11:45:30', datetime.date(2011, 10, 12)),
    ...   ('10-12 11:45', None),
    ...   ('', None),
    ...   )
    >>> all(wrap_parse(test) == expected for test, expected in cases)
    True
    
    0 讨论(0)
  • 2021-01-04 06:49

    Depending on your domain following solution might work:

    DEFAULT_DATE = datetime.datetime(datetime.MINYEAR, 1, 1)
    
    def parse_no_default(dt_str):    
        dt = parser.parse(dt_str, default=DEFAULT_DATE).date()
        if dt != DEFAULT_DATE:
           return dt
        else:
           return None
    

    Another approach would be to monkey patch parser class (this is very hackiesh, so I wouldn't recommend it if you have other options):

    import dateutil.parser as parser
    def parse(self, timestr, default=None,
              ignoretz=False, tzinfos=None,
              **kwargs):
        return self._parse(timestr, **kwargs)
    parser.parser.parse = parse
    

    You can use it as follows:

    >>> ffffd = parser.parser().parse('2011-01-02', None)
    >>> ffffd
    _result(year=2011, month=01, day=02)
    >>> ffffd = parser.parser().parse('2011', None)
    >>> ffffd
    _result(year=2011)
    

    By checking which members available in result (ffffd) you could determine when return None. When all fields available you can convert ffffd into datetime object:

    # ffffd might have following fields:
    # "year", "month", "day", "weekday",
    # "hour", "minute", "second", "microsecond",
    # "tzname", "tzoffset"
    datetime.datetime(ffffd.year, ffffd.month, ffffd.day)
    
    0 讨论(0)
提交回复
热议问题