How can the regex below be modified to match dates with ordinals on the day part? This regex matches "Jan 1, 2003 | February 29, 2004 | November 02, 3202" but I need it to match also: "Jan 1st, 2003 | February 29th, 2004 | November 02nd, 3202 | March 3rd, 2010"
^(?:(((Jan(uary)?|Ma(r(ch)?|y)|Jul(y)?|Aug(ust)?|Oct(ober)?|Dec(ember)?)\ 31)|((Jan(uary)?|Ma(r(ch)?|y)|Apr(il)?|Ju((ly?)|(ne?))|Aug(ust)?|Oct(ober)?|(Sept|Nov|Dec)(ember)?)\ (0?[1-9]|([12]\d)|30))|(Feb(ruary)?\ (0?[1-9]|1\d|2[0-8]|(29(?=,\ ((1[6-9]|[2-9]\d)(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00)))))))\,\ ((1[6-9]|[2-9]\d)\d{2}))
Thank you.
This will depend on your use case, but in the interest of pragmatism, you might do well to just match anything matching:
(1) any month name or abbreviation;
(2) whitespace;
(3) any one or two digits;(4) whitespace;
(5) any st,nd,rd,th;
(6) whitespace OR comma + optional whitespace;
(7) any four digits;
I'm not sure what you're matching in, but if I had Jan 35nd,3001
, I think I'd rather capture it now and invalidate it later than to just skip over it right at the get-go.
Also, depending on your data set, consider case sensitivity issues and common international English variants, like 1 Jan 2004
or 1st Jan, 2004
or January, 2004
etc.
line breaks added
^(?:j(?:an(?:uary)?|un(?:e)?|ul(?:y)?)?|feb(?:ruary)?|ma(?:r(?:ch)?|y)
|a(?:pr(?:il)?|ug(?:ust)?)|sep(?:t|tember)?|oct(?:ober)?|(?:nov|dec)(?:ember)?)
\s+\d{1,2}(?:st|nd|rd|th)?(?:\s+|,\s*)\d{4}\b
Even more pragmatic (and readable), unless you have a very bizarre dataset, is to allow anything after the common prefixes:
(?:jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)[a-z]*?\s+\d{1,2}(?:[a-z]{2})?(?:\s+|,\s*)\d{4}\b
Would this match octagenarianism 99xx, 0000
? Yes. Is that likely to be an issue? I doubt it.
That regex is doing waaaaay too much. You'd be much better off using your language's equivalent of strptime()
. However, the regex below will match ordinals:
^(?:(((Jan(uary)?|Ma(r(ch)?|y)|Jul(y)?|Aug(ust)?|Oct(ober)?|Dec(ember)?)\ 31(st)?)|((Jan(uary)?|Ma(r(ch)?|y)|Apr(il)?|Ju((ly?)|(ne?))|Aug(ust)?|Oct(ober)?|(Sept|Nov|Dec)(ember)?)\ (0?[1-9]|([12]\d)|30))(st|nd|rd|th)?|(Feb(ruary)?\ (0?[1-9]|1\d|2[0-8]|(29(th)?(?=,\ ((1[6-9]|[2-9]\d)(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00)))))(st|nd|rd|th)?))\,\ ((1[6-9]|[2-9]\d)\d{2}))
Note that it will also match things like "20nd" but the likelihood of encountering that in real data is way too low to bother caring in most cases.
来源:https://stackoverflow.com/questions/2118825/modify-regex-to-match-dates-with-ordinals-st-nd-rd-th