nltk sentence tokenizer, consider new lines as sentence boundary
I am using nltk's PunkSentenceTokenizer to tokenize a text to a set of sentences. However, the tokenizer doesn't seem to consider new paragraph or new lines as a new sentence. >>> from nltk.tokenize.punkt import PunktSentenceTokenizer >>> tokenizer = PunktSentenceTokenizer() >>> tokenizer.tokenize('Sentence 1 \n Sentence 2. Sentence 3.') ['Sentence 1 \n Sentence 2.', 'Sentence 3.'] >>> tokenizer.span_tokenize('Sentence 1 \n Sentence 2. Sentence 3.') [(0, 24), (25, 36)] I would like it to to consider new lines as a boundary of sentences as well. Anyway to do this (I need to save the offsets too