RegEx to match acronyms

前端未结

关注

 3  1105

旧时难觅i

I am trying to write a regular expression that will match values such as U.S., D.C., U.S.A., etc.

Here is what I have so far -

相关标签:

3条回答

傲寒

2021-01-16 11:11
This should work:
```
/([a-zA-Z]\.)+/g
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
死守一世寂寞

2021-01-16 11:19
None of these proposed solutions do what yours does - make sure that there are at least 2 letters in the acronym. Also, yours works on http://rubular.com/ . This is probably some issue with the regex implementation - to be fair, all of the matches that you got were valid acronyms. To fix this, you could either:
- Make sure there's a space or EOF succeeding your expression ((?=\s|$) in ruby at least)
- Surround your regex with ^ and $ to make sure it catches the whole string. You'd have to split the whole string on spaces to get matches with this though.
I prefer the former solution - to do this you'd have:

\b([a-zA-Z]\.){2,}(?=\s|$)

Edit: I've realized this doesn't actually work with other punctuation in the string, and a couple of other edge cases. This is super ugly, but I think it should be good enough:

(?<=\s|^)((?:[a-zA-Z]\.){2,})(?=[[:punct:]]?(?:\s|$))

This assumes that you've got this [[:punct:]] character class, and allows for 0-1 punctuation marks after an acronym that won't be captured. I've also fixed it up so that there's a single capture group that gets the whole acronym. Check out validation at http://rubular.com/r/lmr0qERLDh

Bonus: you now get to make this super confusing to anyone reading it.
0 讨论(0)
发布评论:

提交评论
- 加载中...
轻奢々

2021-01-16 11:23

Try (?:[a-zA-Z]\.){2,}

?: (non-capturing group) is there because you want to omit capturing the last iteration of the repeated group.

For example, without ?:, 'U.S.A.' will yield a group match 'A.', which you are not interested about.

0 讨论(0)
发布评论:

提交评论
- 加载中...