python nltk.sent_tokenize error ascii codec can't decode

前端 未结 1 574
暗喜
暗喜 2021-02-10 16:20

I could successfully read text into a variable but while trying to tokenize the texts im getting this strange error :

sentences=nltk.sent_tokenize(sample)
Unicod         


        
相关标签:
1条回答
  • 2021-02-10 16:42

    In a nutshell, NLTK3's pos_tag function doesn't work.

    The NLTK2 function works fine, however.

    pip uninstall nltk

    pip install http://pypi.python.org/packages/source/n/nltk/nltk-2.0.4.tar.gz

    On the other hand, the tagger is pretty bad (apparently 'conservatory' is a verb). I wish SpaCy worked on Windows.

    0 讨论(0)
提交回复
热议问题