test_string = \'\'\'dated as of October 17, 2012 when we went caroling, dated as of December 21, 2011 when we ate bananas\'\'\'
import re
import calendar
months_f
You are very close!
Try:
import re
import calendar
test_string = '''dated as of October 17, 2012 when we went caroling, dated as of December 21, 2011 when we ate bananas'''
test_pattern = re.compile('|'.join(r'(?:\b%s\s+\d{1,2},\s+\d{4})' % month
for month in calendar.month_name[1:]))
print test_pattern.findall(test_string)
# ['October 17, 2012', 'December 21, 2011']
Other comments:
,?
at the end of your regex. It really does not validate a date any more that the first part of the regex.December 21,\n2011
Your pattern doesn't work because you have forgotten to put the alternation with month names in a non capturing group (?:...)
An other notice:
It's a shame to load a module only to have the month names in english, when you can write them and optimise your pattern! Example:
pattern_1 = r'\b(?:(?:jan|febr)uary|ma(?:y|rch)|ju(?:ne|ly)|a(?:pril|ugust)|(?:octo|(?:sept|nov|dec)em)ber)\s+[0-9]{1,2},?\s+[0-9]{4},?'