发表新帖

发表新帖

What are all possible pos tags of NLTK?

前端未结

关注

 8  1675

How do I find a list with all possible pos tags used by the Natural Language Toolkit (nltk)?

相关标签:

8条回答

自闭症患者

2020-12-02 04:33
The tag set depends on the corpus that was used to train the tagger. The default tagger of nltk.pos_tag() uses the Penn Treebank Tag Set.

In NLTK 2, you could check which tagger is the default tagger as follows:
```
import nltk
nltk.tag._POS_TAGGER
>>> 'taggers/maxent_treebank_pos_tagger/english.pickle'
```
That means that it's a Maximum Entropy tagger trained on the Treebank corpus.

nltk.tag._POS_TAGGER does not exist anymore in NLTK 3 but the documentation states that the off-the-shelf tagger still uses the Penn Treebank tagset.
0 讨论(0)
发布评论:

提交评论
- 加载中...
野的像风

2020-12-02 04:38

You can download the list here: ftp://ftp.cis.upenn.edu/pub/treebank/doc/tagguide.ps.gz. It includes confusing parts of speech, capitalization, and other conventions. Also, wikipedia has an interesting section similar to this. Section: Part-of-speech tags used.

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2

热议问题