RegEx to match acronyms

前端 未结 3 1105
旧时难觅i
旧时难觅i 2021-01-16 10:24

I am trying to write a regular expression that will match values such as U.S., D.C., U.S.A., etc.

Here is what I have so far -

相关标签:
3条回答
  • 2021-01-16 11:11

    This should work:

    /([a-zA-Z]\.)+/g
    
    0 讨论(0)
  • 2021-01-16 11:19

    None of these proposed solutions do what yours does - make sure that there are at least 2 letters in the acronym. Also, yours works on http://rubular.com/ . This is probably some issue with the regex implementation - to be fair, all of the matches that you got were valid acronyms. To fix this, you could either:

    • Make sure there's a space or EOF succeeding your expression ((?=\s|$) in ruby at least)
    • Surround your regex with ^ and $ to make sure it catches the whole string. You'd have to split the whole string on spaces to get matches with this though.

    I prefer the former solution - to do this you'd have:

    \b([a-zA-Z]\.){2,}(?=\s|$)

    Edit: I've realized this doesn't actually work with other punctuation in the string, and a couple of other edge cases. This is super ugly, but I think it should be good enough:

    (?<=\s|^)((?:[a-zA-Z]\.){2,})(?=[[:punct:]]?(?:\s|$))

    This assumes that you've got this [[:punct:]] character class, and allows for 0-1 punctuation marks after an acronym that won't be captured. I've also fixed it up so that there's a single capture group that gets the whole acronym. Check out validation at http://rubular.com/r/lmr0qERLDh

    Bonus: you now get to make this super confusing to anyone reading it.

    0 讨论(0)
  • 2021-01-16 11:23

    Try (?:[a-zA-Z]\.){2,}

    ?: (non-capturing group) is there because you want to omit capturing the last iteration of the repeated group.

    For example, without ?:, 'U.S.A.' will yield a group match 'A.', which you are not interested about.

    0 讨论(0)
提交回复
热议问题