How do I modify this REGEX to pick up all dates in the test string

后端 未结 2 818
日久生厌
日久生厌 2021-01-20 20:41
test_string = \'\'\'dated as of October 17, 2012 when we went caroling, dated as of December 21, 2011 when we ate bananas\'\'\'


import re
import calendar

months_f         


        
相关标签:
2条回答
  • 2021-01-20 21:18

    You are very close!

    Try:

    import re
    import calendar
    
    test_string = '''dated as of October 17, 2012 when we went caroling, dated as of December 21, 2011 when we ate bananas'''
    test_pattern = re.compile('|'.join(r'(?:\b%s\s+\d{1,2},\s+\d{4})' % month 
                                           for month in calendar.month_name[1:]))
    print test_pattern.findall(test_string)
    # ['October 17, 2012', 'December 21, 2011']
    

    Other comments:

    1. There is no need for the optional ,? at the end of your regex. It really does not validate a date any more that the first part of the regex.
    2. You may need to use the re.I for making case insensitive.
    3. You may need to use re.S to deal with a carriage return in a legitimate date like December 21,\n2011
    4. Use named capture groups to capture the month, day and year and then use datetime to validate the date.
    0 讨论(0)
  • 2021-01-20 21:20

    Your pattern doesn't work because you have forgotten to put the alternation with month names in a non capturing group (?:...)

    An other notice:

    It's a shame to load a module only to have the month names in english, when you can write them and optimise your pattern! Example:

    pattern_1 = r'\b(?:(?:jan|febr)uary|ma(?:y|rch)|ju(?:ne|ly)|a(?:pril|ugust)|(?:octo|(?:sept|nov|dec)em)ber)\s+[0-9]{1,2},?\s+[0-9]{4},?'
    
    0 讨论(0)
提交回复
热议问题