How do I find a list with all possible pos tags used by the Natural Language Toolkit (nltk)?
The tag set depends on the corpus that was used to train the tagger.
The default tagger of nltk.pos_tag()
uses the Penn Treebank Tag Set.
In NLTK 2, you could check which tagger is the default tagger as follows:
import nltk
nltk.tag._POS_TAGGER
>>> 'taggers/maxent_treebank_pos_tagger/english.pickle'
That means that it's a Maximum Entropy tagger trained on the Treebank corpus.
nltk.tag._POS_TAGGER
does not exist anymore in NLTK 3 but the documentation states that the off-the-shelf tagger still uses the Penn Treebank tagset.
You can download the list here: ftp://ftp.cis.upenn.edu/pub/treebank/doc/tagguide.ps.gz. It includes confusing parts of speech, capitalization, and other conventions. Also, wikipedia has an interesting section similar to this. Section: Part-of-speech tags used.