What are all possible pos tags of NLTK?

前端 未结 8 1675
你的背包
你的背包 2020-12-02 03:36

How do I find a list with all possible pos tags used by the Natural Language Toolkit (nltk)?

相关标签:
8条回答
  • 2020-12-02 04:33

    The tag set depends on the corpus that was used to train the tagger. The default tagger of nltk.pos_tag() uses the Penn Treebank Tag Set.

    In NLTK 2, you could check which tagger is the default tagger as follows:

    import nltk
    nltk.tag._POS_TAGGER
    >>> 'taggers/maxent_treebank_pos_tagger/english.pickle'
    

    That means that it's a Maximum Entropy tagger trained on the Treebank corpus.

    nltk.tag._POS_TAGGER does not exist anymore in NLTK 3 but the documentation states that the off-the-shelf tagger still uses the Penn Treebank tagset.

    0 讨论(0)
  • 2020-12-02 04:38

    You can download the list here: ftp://ftp.cis.upenn.edu/pub/treebank/doc/tagguide.ps.gz. It includes confusing parts of speech, capitalization, and other conventions. Also, wikipedia has an interesting section similar to this. Section: Part-of-speech tags used.

    0 讨论(0)
提交回复
热议问题