How can I split a text into sentences?

前端 未结 13 1013
傲寒
傲寒 2020-11-22 06:33

I have a text file. I need to get a list of sentences.

How can this be implemented? There are a lot of subtleties, such as a dot being used in abbreviations.

13条回答
  •  渐次进展
    2020-11-22 07:02

    Instead of using regex for spliting the text into sentences, you can also use nltk library.

    >>> from nltk import tokenize
    >>> p = "Good morning Dr. Adams. The patient is waiting for you in room number 3."
    
    >>> tokenize.sent_tokenize(p)
    ['Good morning Dr. Adams.', 'The patient is waiting for you in room number 3.']
    

    ref: https://stackoverflow.com/a/9474645/2877052

提交回复
热议问题