I am trying to write a regular expression that will match values such as U.S.
, D.C.
, U.S.A.
, etc.
Here is what I have so far -
This should work:
/([a-zA-Z]\.)+/g
None of these proposed solutions do what yours does - make sure that there are at least 2 letters in the acronym. Also, yours works on http://rubular.com/ . This is probably some issue with the regex implementation - to be fair, all of the matches that you got were valid acronyms. To fix this, you could either:
(?=\s|$)
in ruby at least)^
and $
to make sure it catches the whole string. You'd have to split the whole string on spaces to get matches with this though.I prefer the former solution - to do this you'd have:
\b([a-zA-Z]\.){2,}(?=\s|$)
Edit: I've realized this doesn't actually work with other punctuation in the string, and a couple of other edge cases. This is super ugly, but I think it should be good enough:
(?<=\s|^)((?:[a-zA-Z]\.){2,})(?=[[:punct:]]?(?:\s|$))
This assumes that you've got this [[:punct:]] character class, and allows for 0-1 punctuation marks after an acronym that won't be captured. I've also fixed it up so that there's a single capture group that gets the whole acronym. Check out validation at http://rubular.com/r/lmr0qERLDh
Bonus: you now get to make this super confusing to anyone reading it.
Try (?:[a-zA-Z]\.){2,}
?:
(non-capturing group) is there because you want to omit capturing the last iteration of the repeated group.
For example, without ?:
, 'U.S.A.' will yield a group match 'A.', which you are not interested about.