After using parsedatetime to get a time structure from the input string, how does one slice the rest of the string out?

后端 未结 1 1622
爱一瞬间的悲伤
爱一瞬间的悲伤 2021-01-24 04:28

I\'m wondering how to use parsedatetime for Python to return both the timestruct and the rest of the input string with just the date/time input removed.

Exa

1条回答
  •  有刺的猬
    2021-01-24 04:44

    The only method of Calendar that returns that info is nlp() (which I suppose stands for Natural Language Processing). Here is a function returning all parts of the input:

    import parsedatetime
    
    calendar = parsedatetime.Calendar()
    
    def parse(string, source_time = None):
        ret = []
        parsed_parts = calendar.nlp(string, source_time)
        if parsed_parts:
            last_stop = 0
            for part in parsed_parts:
                dt, status, start, stop, segment = part
                if start > last_stop:
                    ret.append((None, 0, string[last_stop:start]))
                ret.append((dt, status, segment))
                last_stop = stop
            if len(string) > last_stop:
                ret.append((None, 0, string[last_stop:]))
        return ret
    
    for s in ("Soccer with @homies at Payne Whitney tomorrow at 2 pm to 4 pm!",
              "Soccer with @homies at Payne Whitney tomorrow starting at 2 pm to 4 pm!",
              "Soccer with @homies at Payne Whitney tomorrow starting at 3 pm to 5 pm!"):
        print()
        print(s)
        result = parse(s)
        for part in result:
            print(part)
    

    Output:

    Soccer with @homies at Payne Whitney tomorrow at 2 pm to 4 pm!
    (None, 0, 'Soccer with @homies at Payne Whitney ')
    (datetime.datetime(2020, 1, 15, 16, 0), 3, 'tomorrow at 2 pm to 4 pm')
    (None, 0, '!')
    
    Soccer with @homies at Payne Whitney tomorrow starting at 2 pm to 4 pm!
    (None, 0, 'Soccer with @homies at Payne Whitney ')
    (datetime.datetime(2020, 1, 15, 9, 0), 1, 'tomorrow')
    (None, 0, ' starting ')
    (datetime.datetime(2020, 1, 14, 16, 0), 2, 'at 2 pm to 4 pm')
    (None, 0, '!')
    
    Soccer with @homies at Payne Whitney tomorrow starting at 3 pm to 5 pm!
    (None, 0, 'Soccer with @homies at Payne Whitney ')
    (datetime.datetime(2020, 1, 15, 9, 0), 1, 'tomorrow')
    (None, 0, ' starting ')
    (datetime.datetime(2020, 1, 14, 15, 0), 2, 'at 3 pm')
    (None, 0, ' to ')
    (datetime.datetime(2020, 1, 14, 17, 0), 2, '5 pm')
    (None, 0, '!')
    

    The status tells you whether the associated datetime is actually a date (1), a time (2), a datetime (3) or neither (0). In the first two cases, the missing fields are taken from the source_time, or from the current time if that is None.

    But if you examine the output closely, you will see that there is a reliability problem here. Only the third parse can be used, in the other two cases information has been lost. Furthermore, I have no idea why the second and third string would be parsed differently.

    An alternative library is dateparser. It looks more powerful, but has its own problems. The dateparser.parse.search_dates() function comes close to what you are interested in, but I haven't been able to find out how to tell whether a parsed datetime conveys only date information, only time information, or both. Anyway, here is a function that uses search_dates() to yield an output similar to the above, but without the status of each part:

    from dateparser.search import search_dates
    
    def parse(string: str):
        ret = []
        parsed_parts = search_dates(string)
        if parsed_parts:
            last_stop = 0
            for part in parsed_parts:
                segment, dt = part
                start = string.find(segment, last_stop)
                stop = start + len(segment)
                if start > last_stop:
                    ret.append((None, string[last_stop:start]))
                ret.append((dt, segment))
                last_stop = stop
            if len(string) > last_stop:
                ret.append((None, string[last_stop:]))
        return ret
    
    
    for s in ("Soccer with @homies at Payne Whitney tomorrow at 2 pm to 4 pm!",
              "Soccer with @homies at Payne Whitney tomorrow starting at 2 pm to 4 pm!",
              "Soccer with @homies at Payne Whitney tomorrow starting at 3 pm to 5 pm!"):
        print()
        print(s)
        result = parse(s)
        for part in result:
            print(part)
    

    Output:

    Soccer with @homies at Payne Whitney tomorrow at 2 pm to 4 pm!
    (None, 'Soccer with @homies at Payne Whitney ')
    (datetime.datetime(2020, 1, 15, 14, 0), 'tomorrow at 2 pm')
    (None, ' to ')
    (datetime.datetime(2020, 1, 13, 16, 0), '4 pm')
    (None, '!')
    
    Soccer with @homies at Payne Whitney tomorrow starting at 2 pm to 4 pm!
    (None, 'Soccer with @homies at Payne Whitney ')
    (datetime.datetime(2020, 1, 15, 0, 43, 0, 726130), 'tomorrow')
    (None, ' starting ')
    (datetime.datetime(2020, 1, 13, 14, 0), 'at 2 pm')
    (None, ' to ')
    (datetime.datetime(2020, 1, 13, 16, 0), '4 pm')
    (None, '!')
    
    Soccer with @homies at Payne Whitney tomorrow starting at 3 pm to 5 pm!
    (None, 'Soccer with @homies at Payne Whitney ')
    (datetime.datetime(2020, 1, 15, 0, 43, 0, 784468), 'tomorrow')
    (None, ' starting ')
    (datetime.datetime(2020, 1, 13, 15, 0), 'at 3 pm')
    (None, ' to ')
    (datetime.datetime(2020, 1, 13, 17, 0), '5 pm')
    (None, '!')
    

    I think that searching for the substring in the input is acceptable, and the parsing seems more predictable, but not knowing how to interpret each datetime is a problem.

    0 讨论(0)
提交回复
热议问题