What do spaCy's part-of-speech and dependency tags mean?

后端未结

关注

 4  1939

spaCy tags up each of the Tokens in a Document with a part of speech (in two different formats, one stored in the pos and pos_

相关标签:

4条回答

北荒

2021-01-30 02:58

At present, dependency parsing and tagging in SpaCy appears to be implemented only at the word level, and not at the phrase (other than noun phrase) or clause level. This means SpaCy can be used to identify things like nouns (NN, NNS), adjectives (JJ, JJR, JJS), and verbs (VB, VBD, VBG, etc.), but not adjective phrases (ADJP), adverbial phrases (ADVP), or questions (SBARQ, SQ).

For illustration, when you use SpaCy to parse the sentence "Which way is the bus going?", we get the following tree.

By contrast, if you use the Stanford parser you get a much more deeply structured syntax tree.

0 讨论(0)
发布评论:

提交评论
- 加载中...
無奈伤痛

2021-01-30 02:59
tl;dr answer

Just expand the lists at:
- https://spacy.io/api/annotation#pos-tagging (POS tags) and
- https://spacy.io/api/annotation#dependency-parsing (dependency tags)
Longer answer

The docs have greatly improved since I first asked this question, and spaCy now documents this much better.

Part-of-speech tags

The pos and tag attributes are tabulated at https://spacy.io/api/annotation#pos-tagging, and the origin of those lists of values is described. At the time of this (January 2020) edit, the docs say of the pos attribute that:

spaCy maps all language-specific part-of-speech tags to a small, fixed set of word type tags following the Universal Dependencies scheme. The universal tags don’t code for any morphological features and only cover the word type. They’re available as the Token.pos and Token.pos_ attributes.

As for the tag attribute, the docs say:

The English part-of-speech tagger uses the OntoNotes 5 version of the Penn Treebank tag set. We also map the tags to the simpler Universal Dependencies v2 POS tag set.

and

The German part-of-speech tagger uses the TIGER Treebank annotation scheme. We also map the tags to the simpler Universal Dependencies v2 POS tag set.

You thus have a choice between using a coarse-grained tag set that is consistent across languages (.pos), or a fine-grained tag set (.tag) that is specific to a particular treebank, and hence a particular language.

.pos_ tag list

The docs list the following coarse-grained tags used for the pos and pos_ attributes:
- ADJ: adjective, e.g. big, old, green, incomprehensible, first
- ADP: adposition, e.g. in, to, during
- ADV: adverb, e.g. very, tomorrow, down, where, there
- AUX: auxiliary, e.g. is, has (done), will (do), should (do)
- CONJ: conjunction, e.g. and, or, but
- CCONJ: coordinating conjunction, e.g. and, or, but
- DET: determiner, e.g. a, an, the
- INTJ: interjection, e.g. psst, ouch, bravo, hello
- NOUN: noun, e.g. girl, cat, tree, air, beauty
- NUM: numeral, e.g. 1, 2017, one, seventy-seven, IV, MMXIV
- PART: particle, e.g. ’s, not,
- PRON: pronoun, e.g I, you, he, she, myself, themselves, somebody
- PROPN: proper noun, e.g. Mary, John, London, NATO, HBO
- PUNCT: punctuation, e.g. ., (, ), ?
- SCONJ: subordinating conjunction, e.g. if, while, that
- SYM: symbol, e.g. $, %, §, ©, +, −, ×, ÷, =, :),
0 讨论(0)
发布评论:

提交评论
- 加载中...
庸人自扰

2021-01-30 03:05
The official documentation now provides much more details for all those annotations at https://spacy.io/api/annotation (and the list of other attributes for tokens can be found at https://spacy.io/api/token).

As the documentation shows, their parts-of-speech (POS) and dependency tags have both Universal and specific variations for different languages and the explain() function is a very useful shortcut to get a better description of a tag's meaning without the documentation, e.g.
```
spacy.explain("VBD")
```
which gives "verb, past tense".
0 讨论(0)
发布评论:

提交评论
- 加载中...
感情败类

2021-01-30 03:15
Just a quick tip about getting the detail meaning of the short forms. You can use explain method like following:
```
spacy.explain('pobj')
```
which will give you output like:
```
'object of preposition'
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

What do spaCy's part-of-speech and dependency tags mean?

tl;dr answer

Longer answer

Part-of-speech tags

.pos_ tag list

`.pos_` tag list