I want to match dates that have the following format:
2010-08-27, 2010/08/27
Right now I am not very particular about the date being actually feasible, but
You can use the datetime
module to parse dates:
import datetime
print datetime.datetime.strptime('2010-08-27', '%Y-%m-%d')
print datetime.datetime.strptime('2010-15-27', '%Y-%m-%d')
output:
2010-08-27 00:00:00
Traceback (most recent call last):
File "./x.py", line 6, in <module>
print datetime.datetime.strptime('2010-15-27', '%Y-%m-%d')
File "/usr/lib/python2.7/_strptime.py", line 325, in _strptime
(data_string, format))
ValueError: time data '2010-15-27' does not match format '%Y-%m-%d'
So catching ValueError
will tell you if the date matches:
def valid_date(datestring):
try:
datetime.datetime.strptime(datestring, '%Y-%m-%d')
return True
except ValueError:
return False
To allow for various formats you could either test for all possibilities, or use re
to parse out the fields first:
import datetime
import re
def valid_date(datestring):
try:
mat=re.match('(\d{2})[/.-](\d{2})[/.-](\d{4})$', datestring)
if mat is not None:
datetime.datetime(*(map(int, mat.groups()[-1::-1])))
return True
except ValueError:
pass
return False
Use the datetime
module. Here is a regex for the sake of knowledge although you shouldn't use it:
r'\d{4}[-/]\d{2}[-/]\d{2}'
You can use this code:
import re
# regular expression to match dates in format: 2010-08-27 and 2010/08/27
# date_reg_exp = re.compile('(\d+[-/]\d+[-/]\d+)')
updated regular expression below:
# regular expression to match dates in format: 2010-08-27 and 2010/08/27
# and with mixed separators 2010/08-27
# date_reg_exp = re.compile('\d{4}[-/]\d{2}[-/]\d{2}')
# if separators should not be mixed use backreference:
date_reg_exp = re.compile('\d{4}(?P<sep>[-/])\d{2}(?P=sep)\d{2}')
# a string to test the regular expression above
test_str= """
fsf2010/08/27sdfsdfsd
dsf sfds f2010/08/26 fsdf
asdsds 2009-02-02 afdf
"""
# finds all the matches of the regular expression and
# returns a list containing them
matches_list=date_reg_exp.findall(test_str)
# iterates the matching list and prints all the matches
for match in matches_list:
print match
dateutil package has a quite smart dates parser. It parses a wide range of dateformats. http://pypi.python.org/pypi/python-dateutil