发表新帖

发表新帖

Split Text into paragraphs NLTK - usage of nltk.tokenize.texttiling?

前端未结

关注

 1  1190

I was looking at methods to split documents into paragraphs and I came across texttiling as one possible way to do this.

Here is my attempt to use it. However, I don

相关标签:

1条回答

说谎

2021-01-13 18:47
I'm messing around with this one myself just now for the same reason you are and had the same question you did so don't be too upset if this is wrong. I figured best to pass on what little I know... :)

I'm not sure yet but I found in this bug report an example of using the TextTilingTokenizer:
```
alice=nltk.corpus.gutenberg.raw('carroll-alice.txt')
ttt = nltk.tokenize.TextTilingTokenizer()
tiles = ttt.tokenize(alice[140309 : ])
```
It appears that you want to feed your text to the tokenize method on the the TextTilingTokenizer.
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题