Python Untokenize a sentence

后端 未结 10 984
名媛妹妹
名媛妹妹 2021-02-01 15:46

There are so many guides on how to tokenize a sentence, but i didn\'t find any on how to do the opposite.

 import nltk
 words = nltk.word_tokenize(\"I\'ve found          


        
10条回答
  •  深忆病人
    2021-02-01 16:12

    To reverse word_tokenize from nltk, i suggest looking in http://www.nltk.org/_modules/nltk/tokenize/punkt.html#PunktLanguageVars.word_tokenize and do some reverse engineering.

    Short of doing crazy hacks on nltk, you can try this:

    >>> import nltk
    >>> import string
    >>> nltk.word_tokenize("I've found a medicine for my disease.")
    ['I', "'ve", 'found', 'a', 'medicine', 'for', 'my', 'disease', '.']
    >>> tokens = nltk.word_tokenize("I've found a medicine for my disease.")
    >>> "".join([" "+i if not i.startswith("'") and i not in string.punctuation else i for i in tokens]).strip()
    "I've found a medicine for my disease."
    

提交回复
热议问题