I was looking at methods to split documents into paragraphs and I came across texttiling as one possible way to do this.
Here is my attempt to use it. However, I don
I'm messing around with this one myself just now for the same reason you are and had the same question you did so don't be too upset if this is wrong. I figured best to pass on what little I know... :)
I'm not sure yet but I found in this bug report an example of using the TextTilingTokenizer:
alice=nltk.corpus.gutenberg.raw('carroll-alice.txt')
ttt = nltk.tokenize.TextTilingTokenizer()
tiles = ttt.tokenize(alice[140309 : ])
It appears that you want to feed your text to the tokenize method on the the TextTilingTokenizer.