Python Untokenize a sentence

后端 未结 10 991
名媛妹妹
名媛妹妹 2021-02-01 15:46

There are so many guides on how to tokenize a sentence, but i didn\'t find any on how to do the opposite.

 import nltk
 words = nltk.word_tokenize(\"I\'ve found          


        
10条回答
  •  攒了一身酷
    2021-02-01 16:14

    You can use "treebank detokenizer" - TreebankWordDetokenizer:

    from nltk.tokenize.treebank import TreebankWordDetokenizer
    TreebankWordDetokenizer().detokenize(['the', 'quick', 'brown'])
    # 'The quick brown'
    

    There is also MosesDetokenizer which was in nltk but got removed because of the licensing issues, but it is available as a Sacremoses standalone package.

提交回复
热议问题