There are so many guides on how to tokenize a sentence, but i didn\'t find any on how to do the opposite.
import nltk
words = nltk.word_tokenize(\"I\'ve found
You can use "treebank detokenizer" - TreebankWordDetokenizer
:
from nltk.tokenize.treebank import TreebankWordDetokenizer
TreebankWordDetokenizer().detokenize(['the', 'quick', 'brown'])
# 'The quick brown'
There is also MosesDetokenizer
which was in nltk
but got removed because of the licensing issues, but it is available as a Sacremoses standalone package.