I\'m writing a class RecurringInterval
which - based on the dateutil.rrule object - represents a recurring interval in time. I have defined a custom, human-read
At this point, your language is getting complex enough that it's time to ditch regular expressions and learn how to use a proper parsing library. I threw this together using pyparsing, and I've annotated it heavily to try and explain what's going on, but if anything's unclear do ask and I'll try to explain.
from pyparsing import Regex, oneOf, OneOrMore
# Boring old constants, I'm sure you know how to fill these out...
months = ['January', 'February']
weekdays = ['Monday', 'Tuesday']
frequencies = ['Daily', 'Weekly']
# A datetime expression is anything matching this regex. We could split it down
# even further to get day, month, year attributes in our results object if we felt
# like it
datetime_expr = Regex(r'(\d{4})-(\d\d?)-(\d\d?) (\d{2}):(\d{2}):(\d{2})')
# A from or till expression is the word "from" or "till" followed by any valid datetime
from_expr = 'from' + datetime_expr.setResultsName('from_')
till_expr = 'till' + datetime_expr.setResultsName('till')
# A range expression is a from expression followed by a till expression
range_expr = from_expr + till_expr
# A weekday is any old weekday
weekday_expr = oneOf(weekdays)
month_expr = oneOf(months)
frequency_expr = oneOf(frequencies)
# A by weekday expression is the words "by weekday" followed by one or more weekdays
by_weekday_expr = 'by weekday' + OneOrMore(weekday_expr).setResultsName('weekdays')
by_month_expr = 'by month' + OneOrMore(month_expr).setResultsName('months')
# A recurring interval, then, is a frequency, followed by a range, followed by
# a weekday and a month, in any order
recurring_interval = frequency_expr + range_expr + (by_weekday_expr & by_month_expr)
# Let's parse!
if __name__ == '__main__':
res = recurring_interval.parseString('Daily from 1111-11-11 11:00:00 till 1111-11-11 12:00:00 by weekday Monday by month January February')
# Note that setResultsName causes everything to get packed neatly into
# attributes for us, so we can pluck all the bits and pieces out with no
# difficulty at all
print res
print res.from_
print res.till
print res.weekdays
print res.months
You have many options here, each with different downsides.
One approach would be to use a repeated alternation, like (by weekday|by month)*
:
(?P<freq>Weekly)?\s+from (?P<start>.+?)\s+till (?P<end>.+?)(?:\s+by weekday (?P<byweekday>.+?)|\s+by month (?P<bymonth>.+?))*$
This will match strings of the form week month
and month week
, but also week week
or month week month
etc.
Another option would be use lookaheads, like (?=.*by weekday)?(?=.*by month)?
:
(?P<freq>Weekly)?\s+from (?P<start>.+?)\s+till (?P<end>.+?(?=$| by))(?=.*\s+by weekday (?P<byweekday>.+?(?=$| by))|)(?=.*\s+by month (?P<month>.+?(?=$| by))|)
However, this requires a known delimiter (I used " by") to know how far to match. Also, it'll silently ignore any extra characters (meaning it'll match strings of the form by weekday [some gargabe] by month
).