I have a text file. I need to get a list of sentences.
How can this be implemented? There are a lot of subtleties, such as a dot being used in abbreviations.
Also, be wary of additional top level domains that aren't included in some of the answers above.
For example .info, .biz, .ru, .online will throw some sentence parsers but aren't included above.
Here's some info on frequency of top level domains: https://www.westhost.com/blog/the-most-popular-top-level-domains-in-2017/
That could be addressed by editing the code above to read:
alphabets= "([A-Za-z])"
prefixes = "(Mr|St|Mrs|Ms|Dr)[.]"
suffixes = "(Inc|Ltd|Jr|Sr|Co)"
starters = "(Mr|Mrs|Ms|Dr|He\s|She\s|It\s|They\s|Their\s|Our\s|We\s|But\s|However\s|That\s|This\s|Wherever)"
acronyms = "([A-Z][.][A-Z][.](?:[A-Z][.])?)"
websites = "[.](com|net|org|io|gov|ai|edu|co.uk|ru|info|biz|online)"