In Python, how to parse a string representing a set of keyword arguments such that the order does not matter

后端 未结 2 739
走了就别回头了
走了就别回头了 2021-01-18 11:59

I\'m writing a class RecurringInterval which - based on the dateutil.rrule object - represents a recurring interval in time. I have defined a custom, human-read

相关标签:
2条回答
  • 2021-01-18 12:40

    At this point, your language is getting complex enough that it's time to ditch regular expressions and learn how to use a proper parsing library. I threw this together using pyparsing, and I've annotated it heavily to try and explain what's going on, but if anything's unclear do ask and I'll try to explain.

    from pyparsing import Regex, oneOf, OneOrMore
    
    # Boring old constants, I'm sure you know how to fill these out...
    months      = ['January', 'February']
    weekdays    = ['Monday', 'Tuesday']
    frequencies = ['Daily', 'Weekly']
    
    # A datetime expression is anything matching this regex. We could split it down
    # even further to get day, month, year attributes in our results object if we felt
    # like it
    datetime_expr = Regex(r'(\d{4})-(\d\d?)-(\d\d?) (\d{2}):(\d{2}):(\d{2})')
    
    # A from or till expression is the word "from" or "till" followed by any valid datetime
    from_expr = 'from' + datetime_expr.setResultsName('from_')
    till_expr = 'till' + datetime_expr.setResultsName('till')
    
    # A range expression is a from expression followed by a till expression
    range_expr = from_expr + till_expr
    
    # A weekday is any old weekday
    weekday_expr = oneOf(weekdays)
    month_expr = oneOf(months)
    frequency_expr = oneOf(frequencies)
    
    # A by weekday expression is the words "by weekday" followed by one or more weekdays
    by_weekday_expr = 'by weekday' + OneOrMore(weekday_expr).setResultsName('weekdays')
    by_month_expr = 'by month' + OneOrMore(month_expr).setResultsName('months')
    
    # A recurring interval, then, is a frequency, followed by a range, followed by
    # a weekday and a month, in any order
    recurring_interval = frequency_expr + range_expr + (by_weekday_expr & by_month_expr)
    
    # Let's parse!
    if __name__ == '__main__':
        res = recurring_interval.parseString('Daily from 1111-11-11 11:00:00 till 1111-11-11 12:00:00 by weekday Monday by month January February')
    
        # Note that setResultsName causes everything to get packed neatly into
        # attributes for us, so we can pluck all the bits and pieces out with no
        # difficulty at all
        print res
        print res.from_
        print res.till
        print res.weekdays
        print res.months
    
    0 讨论(0)
  • 2021-01-18 12:40

    You have many options here, each with different downsides.

    One approach would be to use a repeated alternation, like (by weekday|by month)*:

    (?P<freq>Weekly)?\s+from (?P<start>.+?)\s+till (?P<end>.+?)(?:\s+by weekday (?P<byweekday>.+?)|\s+by month (?P<bymonth>.+?))*$
    

    This will match strings of the form week month and month week, but also week week or month week month etc.

    Another option would be use lookaheads, like (?=.*by weekday)?(?=.*by month)?:

     (?P<freq>Weekly)?\s+from (?P<start>.+?)\s+till (?P<end>.+?(?=$| by))(?=.*\s+by weekday (?P<byweekday>.+?(?=$| by))|)(?=.*\s+by month (?P<month>.+?(?=$| by))|)
    

    However, this requires a known delimiter (I used " by") to know how far to match. Also, it'll silently ignore any extra characters (meaning it'll match strings of the form by weekday [some gargabe] by month).

    0 讨论(0)
提交回复
热议问题