Java Stanford NLP: Part of Speech labels?

前端未结

关注

 10  2007

The Stanford NLP, demo\'d here, gives an output like this:

Colorless/JJ green/JJ ideas/NNS sleep/VBP furiously/RB ./.

What do the Part of S

相关标签:

10条回答

野趣味

2020-11-27 09:17

I am providing the whole list here and also giving reference link

1.  CC   Coordinating conjunction
2.  CD   Cardinal number
3.  DT   Determiner
4.  EX   Existential there
5.  FW   Foreign word
6.  IN   Preposition or subordinating conjunction
7.  JJ   Adjective
8.  JJR  Adjective, comparative
9.  JJS  Adjective, superlative
10. LS   List item marker
11. MD   Modal
12. NN   Noun, singular or mass
13. NNS  Noun, plural
14. NNP  Proper noun, singular
15. NNPS Proper noun, plural
16. PDT  Predeterminer
17. POS  Possessive ending
18. PRP  Personal pronoun
19. PRP$ Possessive pronoun
20. RB   Adverb
21. RBR  Adverb, comparative
22. RBS  Adverb, superlative
23. RP   Particle
24. SYM  Symbol
25. TO   to
26. UH   Interjection
27. VB   Verb, base form
28. VBD  Verb, past tense
29. VBG  Verb, gerund or present participle
30. VBN  Verb, past participle
31. VBP  Verb, non-3rd person singular present
32. VBZ  Verb, 3rd person singular present
33. WDT  Wh-determiner
34. WP   Wh-pronoun
35. WP$  Possessive wh-pronoun
36. WRB  Wh-adverb

You can find out the whole list of Parts of Speech tags here.

0 讨论(0)

北恋

2020-11-27 09:18

Regarding your second question of finding particular POS (e.g., Noun) tagged word/chunk, here is the sample code you can follow.

public static void main(String[] args) {
    Properties properties = new Properties();
    properties.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(properties);

    String input = "Colorless green ideas sleep furiously.";
    Annotation annotation = pipeline.process(input);
    List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
    List<String> output = new ArrayList<>();
    String regex = "([{pos:/NN|NNS|NNP/}])"; //Noun
    for (CoreMap sentence : sentences) {
        List<CoreLabel> tokens = sentence.get(CoreAnnotations.TokensAnnotation.class);
        TokenSequencePattern pattern = TokenSequencePattern.compile(regex);
        TokenSequenceMatcher matcher = pattern.getMatcher(tokens);
        while (matcher.find()) {
            output.add(matcher.group());
        }
    }
    System.out.println("Input: "+input);
    System.out.println("Output: "+output);
}

The output is:

Input: Colorless green ideas sleep furiously.
Output: [ideas]

0 讨论(0)

误落风尘

2020-11-27 09:35
Here is a more complete list of tags for the Penn Treebank (posted here for the sake of completness):

http://www.surdeanu.info/mihai/teaching/ista555-fall13/readings/PennTreebankConstituents.html

It also includes tags for clause and phrase levels.

Clause Level
```
- S
- SBAR
- SBARQ
- SINV
- SQ
```
Phrase Level
```
- ADJP
- ADVP
- CONJP
- FRAG
- INTJ
- LST
- NAC
- NP
- NX
- PP
- PRN
- PRT
- QP
- RRC
- UCP
- VP
- WHADJP
- WHAVP
- WHNP
- WHPP
- X
```
(descriptions in the link)
0 讨论(0)
发布评论:

提交评论
- 加载中...
没有蜡笔的小新

2020-11-27 09:36
The Penn Treebank Project. Look at the Part-of-speech tagging ps.

JJ is adjective. NNS is noun, plural. VBP is verb present tense. RB is adverb.

That's for english. For chinese, it's the Penn Chinese Treebank. And for german it's the NEGRA corpus.
1. CC Coordinating conjunction
2. CD Cardinal number
3. DT Determiner
4. EX Existential there
5. FW Foreign word
6. IN Preposition or subordinating conjunction
7. JJ Adjective
8. JJR Adjective, comparative
9. JJS Adjective, superlative
10. LS List item marker
11. MD Modal
12. NN Noun, singular or mass
13. NNS Noun, plural
14. NNP Proper noun, singular
15. NNPS Proper noun, plural
16. PDT Predeterminer
17. POS Possessive ending
18. PRP Personal pronoun
19. PRP$ Possessive pronoun
20. RB Adverb
21. RBR Adverb, comparative
22. RBS Adverb, superlative
23. RP Particle
24. SYM Symbol
25. TO to
26. UH Interjection
27. VB Verb, base form
28. VBD Verb, past tense
29. VBG Verb, gerund or present participle
30. VBN Verb, past participle
31. VBP Verb, non3rd person singular present
32. VBZ Verb, 3rd person singular present
33. WDT Whdeterminer
34. WP Whpronoun
35. WP$ Possessive whpronoun
36. WRB Whadverb
0 讨论(0)
发布评论:

提交评论
- 加载中...
暗喜

2020-11-27 09:37
The accepted answer above is missing the following information:

There are also 9 punctuation tags defined (which are not listed in some references, see here). These are:
1. #
2. $
3. '' (used for all forms of closing quote)
4. ( (used for all forms of opening parenthesis)
5. ) (used for all forms of closing parenthesis)
6. ,
7. . (used for all sentence-ending punctuation)
8. : (used for colons, semicolons and ellipses)
9. `` (used for all forms of opening quote)
0 讨论(0)
发布评论:

提交评论
- 加载中...
感情败类

2020-11-27 09:37
Stanford CoreNLP Tags for Other Languages : French, Spanish, German ...

I see you use the parser for English language, which is the default model. You may use the parser for other languages (French, Spanish, German ...) and, be aware, both tokenizers and part of speech taggers are different for each language. If you want to do that, you must download the specific model for the language (using a builder like Maven for example) and then set the model you want to use. Here you have more information about that.

Here you are lists of tags for different languages :
1. Stanford CoreNLP POS Tags for Spanish
2. Stanford CoreNLP POS Tagger for German uses the Stuttgart-Tübingen Tag Set (STTS)
3. Stanford CoreNLP POS tagger for French uses the following tags:
TAGS FOR FRENCH:

Part of Speech Tags for French
```
A     (adjective)
Adv   (adverb)
CC    (coordinating conjunction)
Cl    (weak clitic pronoun)
CS    (subordinating conjunction)
D     (determiner)
ET    (foreign word)
I     (interjection)
NC    (common noun)
NP    (proper noun)
P     (preposition)
PREF  (prefix)
PRO   (strong pronoun)
V     (verb)
PONCT (punctuation mark)
```
Phrasal Categories Tags for French:
```
AP     (adjectival phrases)
AdP    (adverbial phrases)
COORD  (coordinated phrases)
NP     (noun phrases)
PP     (prepositional phrases)
VN     (verbal nucleus)
VPinf  (infinitive clauses)
VPpart (nonfinite clauses)
SENT   (sentences)
Sint, Srel, Ssub (finite clauses)
```
Syntactic Functions for French:
```
SUJ    (subject)
OBJ    (direct object)
ATS    (predicative complement of a subject)
ATO    (predicative complement of a direct object)
MOD    (modifier or adjunct)
A-OBJ  (indirect complement introduced by à)
DE-OBJ (indirect complement introduced by de)
P-OBJ  (indirect complement introduced by another preposition)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页

Java Stanford NLP: Part of Speech labels?

Clause Level

Phrase Level