Split Text into paragraphs NLTK - usage of nltk.tokenize.texttiling?

前端 未结 1 1189
太阳男子
太阳男子 2021-01-13 18:32

I was looking at methods to split documents into paragraphs and I came across texttiling as one possible way to do this.

Here is my attempt to use it. However, I don

1条回答
  •  说谎
    说谎 (楼主)
    2021-01-13 18:47

    I'm messing around with this one myself just now for the same reason you are and had the same question you did so don't be too upset if this is wrong. I figured best to pass on what little I know... :)

    I'm not sure yet but I found in this bug report an example of using the TextTilingTokenizer:

    alice=nltk.corpus.gutenberg.raw('carroll-alice.txt')
    ttt = nltk.tokenize.TextTilingTokenizer()
    tiles = ttt.tokenize(alice[140309 : ])
    

    It appears that you want to feed your text to the tokenize method on the the TextTilingTokenizer.

    0 讨论(0)
提交回复
热议问题