How can I split a text into sentences?

前端 未结 13 1009
傲寒
傲寒 2020-11-22 06:33

I have a text file. I need to get a list of sentences.

How can this be implemented? There are a lot of subtleties, such as a dot being used in abbreviations.

相关标签:
13条回答
  • 2020-11-22 07:02

    Instead of using regex for spliting the text into sentences, you can also use nltk library.

    >>> from nltk import tokenize
    >>> p = "Good morning Dr. Adams. The patient is waiting for you in room number 3."
    
    >>> tokenize.sent_tokenize(p)
    ['Good morning Dr. Adams.', 'The patient is waiting for you in room number 3.']
    

    ref: https://stackoverflow.com/a/9474645/2877052

    0 讨论(0)
提交回复
热议问题