I\'m wondering how to use parsedatetime
for Python to return both the timestruct and the rest of the input string with just the date/time input removed.
Exa
The only method of Calendar
that returns that info is nlp() (which I suppose stands for Natural Language Processing). Here is a function returning all parts of the input:
import parsedatetime
calendar = parsedatetime.Calendar()
def parse(string, source_time = None):
ret = []
parsed_parts = calendar.nlp(string, source_time)
if parsed_parts:
last_stop = 0
for part in parsed_parts:
dt, status, start, stop, segment = part
if start > last_stop:
ret.append((None, 0, string[last_stop:start]))
ret.append((dt, status, segment))
last_stop = stop
if len(string) > last_stop:
ret.append((None, 0, string[last_stop:]))
return ret
for s in ("Soccer with @homies at Payne Whitney tomorrow at 2 pm to 4 pm!",
"Soccer with @homies at Payne Whitney tomorrow starting at 2 pm to 4 pm!",
"Soccer with @homies at Payne Whitney tomorrow starting at 3 pm to 5 pm!"):
print()
print(s)
result = parse(s)
for part in result:
print(part)
Output:
Soccer with @homies at Payne Whitney tomorrow at 2 pm to 4 pm!
(None, 0, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 16, 0), 3, 'tomorrow at 2 pm to 4 pm')
(None, 0, '!')
Soccer with @homies at Payne Whitney tomorrow starting at 2 pm to 4 pm!
(None, 0, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 9, 0), 1, 'tomorrow')
(None, 0, ' starting ')
(datetime.datetime(2020, 1, 14, 16, 0), 2, 'at 2 pm to 4 pm')
(None, 0, '!')
Soccer with @homies at Payne Whitney tomorrow starting at 3 pm to 5 pm!
(None, 0, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 9, 0), 1, 'tomorrow')
(None, 0, ' starting ')
(datetime.datetime(2020, 1, 14, 15, 0), 2, 'at 3 pm')
(None, 0, ' to ')
(datetime.datetime(2020, 1, 14, 17, 0), 2, '5 pm')
(None, 0, '!')
The status
tells you whether the associated datetime
is actually a date (1
), a time (2
), a datetime (3
) or neither (0
). In the first two cases, the missing fields are taken from the source_time
, or from the current time if that is None
.
But if you examine the output closely, you will see that there is a reliability problem here. Only the third parse can be used, in the other two cases information has been lost. Furthermore, I have no idea why the second and third string would be parsed differently.
An alternative library is dateparser. It looks more powerful, but has its own problems. The dateparser.parse.search_dates()
function comes close to what you are interested in, but I haven't been able to find out how to tell whether a parsed datetime
conveys only date information, only time information, or both. Anyway, here is a function that uses search_dates()
to yield an output similar to the above, but without the status
of each part:
from dateparser.search import search_dates
def parse(string: str):
ret = []
parsed_parts = search_dates(string)
if parsed_parts:
last_stop = 0
for part in parsed_parts:
segment, dt = part
start = string.find(segment, last_stop)
stop = start + len(segment)
if start > last_stop:
ret.append((None, string[last_stop:start]))
ret.append((dt, segment))
last_stop = stop
if len(string) > last_stop:
ret.append((None, string[last_stop:]))
return ret
for s in ("Soccer with @homies at Payne Whitney tomorrow at 2 pm to 4 pm!",
"Soccer with @homies at Payne Whitney tomorrow starting at 2 pm to 4 pm!",
"Soccer with @homies at Payne Whitney tomorrow starting at 3 pm to 5 pm!"):
print()
print(s)
result = parse(s)
for part in result:
print(part)
Output:
Soccer with @homies at Payne Whitney tomorrow at 2 pm to 4 pm!
(None, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 14, 0), 'tomorrow at 2 pm')
(None, ' to ')
(datetime.datetime(2020, 1, 13, 16, 0), '4 pm')
(None, '!')
Soccer with @homies at Payne Whitney tomorrow starting at 2 pm to 4 pm!
(None, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 0, 43, 0, 726130), 'tomorrow')
(None, ' starting ')
(datetime.datetime(2020, 1, 13, 14, 0), 'at 2 pm')
(None, ' to ')
(datetime.datetime(2020, 1, 13, 16, 0), '4 pm')
(None, '!')
Soccer with @homies at Payne Whitney tomorrow starting at 3 pm to 5 pm!
(None, 'Soccer with @homies at Payne Whitney ')
(datetime.datetime(2020, 1, 15, 0, 43, 0, 784468), 'tomorrow')
(None, ' starting ')
(datetime.datetime(2020, 1, 13, 15, 0), 'at 3 pm')
(None, ' to ')
(datetime.datetime(2020, 1, 13, 17, 0), '5 pm')
(None, '!')
I think that searching for the substring in the input is acceptable, and the parsing seems more predictable, but not knowing how to interpret each datetime
is a problem.