First Sentence Regex

前端 未结 7 2008
遇见更好的自我
遇见更好的自我 2021-01-20 03:54

I\'m after a regex ( php / perl compatible ) to get the first sentence out of some text. I realize this could get huge if covering every case, but just after something that

相关标签:
7条回答
  • 2021-01-20 04:32

    What you need, in the end, is natural language parsing, which is extremely difficult to do, and probably impossible for regular expressions (even super-souped up PCRE ones) alone. Consider this sentence:

    So much for Mr. Regex and his sentence matching.

    Every answer given thus far will parse that as two sentences, and this isn't even that much of an edge case - it's quite reasonable to imagine a block of text beginning with "Dear Mr. Adams:" or something like that. You can tack on lookbehinds to check what the word before the punctuation mark was, but that's going to get unmaintainable, since you have to check for every possible abbreviation. You have to check for Mr. and e.g. and co. and St. and for so many other ones that you'll never think of. You might end up with a "pretty good" practical solution after a while, but it's going to be ugly, and one day it will fail.

    0 讨论(0)
提交回复
热议问题