Java Stanford NLP: Part of Speech labels?

前端 未结 10 2007
借酒劲吻你
借酒劲吻你 2020-11-27 08:59

The Stanford NLP, demo\'d here, gives an output like this:

Colorless/JJ green/JJ ideas/NNS sleep/VBP furiously/RB ./.

What do the Part of S

相关标签:
10条回答
  • 2020-11-27 09:17

    I am providing the whole list here and also giving reference link

    1.  CC   Coordinating conjunction
    2.  CD   Cardinal number
    3.  DT   Determiner
    4.  EX   Existential there
    5.  FW   Foreign word
    6.  IN   Preposition or subordinating conjunction
    7.  JJ   Adjective
    8.  JJR  Adjective, comparative
    9.  JJS  Adjective, superlative
    10. LS   List item marker
    11. MD   Modal
    12. NN   Noun, singular or mass
    13. NNS  Noun, plural
    14. NNP  Proper noun, singular
    15. NNPS Proper noun, plural
    16. PDT  Predeterminer
    17. POS  Possessive ending
    18. PRP  Personal pronoun
    19. PRP$ Possessive pronoun
    20. RB   Adverb
    21. RBR  Adverb, comparative
    22. RBS  Adverb, superlative
    23. RP   Particle
    24. SYM  Symbol
    25. TO   to
    26. UH   Interjection
    27. VB   Verb, base form
    28. VBD  Verb, past tense
    29. VBG  Verb, gerund or present participle
    30. VBN  Verb, past participle
    31. VBP  Verb, non-3rd person singular present
    32. VBZ  Verb, 3rd person singular present
    33. WDT  Wh-determiner
    34. WP   Wh-pronoun
    35. WP$  Possessive wh-pronoun
    36. WRB  Wh-adverb
    

    You can find out the whole list of Parts of Speech tags here.

    0 讨论(0)
  • 2020-11-27 09:18

    Regarding your second question of finding particular POS (e.g., Noun) tagged word/chunk, here is the sample code you can follow.

    public static void main(String[] args) {
        Properties properties = new Properties();
        properties.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse");
        StanfordCoreNLP pipeline = new StanfordCoreNLP(properties);
    
        String input = "Colorless green ideas sleep furiously.";
        Annotation annotation = pipeline.process(input);
        List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
        List<String> output = new ArrayList<>();
        String regex = "([{pos:/NN|NNS|NNP/}])"; //Noun
        for (CoreMap sentence : sentences) {
            List<CoreLabel> tokens = sentence.get(CoreAnnotations.TokensAnnotation.class);
            TokenSequencePattern pattern = TokenSequencePattern.compile(regex);
            TokenSequenceMatcher matcher = pattern.getMatcher(tokens);
            while (matcher.find()) {
                output.add(matcher.group());
            }
        }
        System.out.println("Input: "+input);
        System.out.println("Output: "+output);
    }
    

    The output is:

    Input: Colorless green ideas sleep furiously.
    Output: [ideas]
    
    0 讨论(0)
  • 2020-11-27 09:35

    Here is a more complete list of tags for the Penn Treebank (posted here for the sake of completness):

    http://www.surdeanu.info/mihai/teaching/ista555-fall13/readings/PennTreebankConstituents.html

    It also includes tags for clause and phrase levels.

    Clause Level

    - S
    - SBAR
    - SBARQ
    - SINV
    - SQ
    

    Phrase Level

    - ADJP
    - ADVP
    - CONJP
    - FRAG
    - INTJ
    - LST
    - NAC
    - NP
    - NX
    - PP
    - PRN
    - PRT
    - QP
    - RRC
    - UCP
    - VP
    - WHADJP
    - WHAVP
    - WHNP
    - WHPP
    - X
    

    (descriptions in the link)

    0 讨论(0)
  • 2020-11-27 09:36

    The Penn Treebank Project. Look at the Part-of-speech tagging ps.

    JJ is adjective. NNS is noun, plural. VBP is verb present tense. RB is adverb.

    That's for english. For chinese, it's the Penn Chinese Treebank. And for german it's the NEGRA corpus.

    1. CC Coordinating conjunction
    2. CD Cardinal number
    3. DT Determiner
    4. EX Existential there
    5. FW Foreign word
    6. IN Preposition or subordinating conjunction
    7. JJ Adjective
    8. JJR Adjective, comparative
    9. JJS Adjective, superlative
    10. LS List item marker
    11. MD Modal
    12. NN Noun, singular or mass
    13. NNS Noun, plural
    14. NNP Proper noun, singular
    15. NNPS Proper noun, plural
    16. PDT Predeterminer
    17. POS Possessive ending
    18. PRP Personal pronoun
    19. PRP$ Possessive pronoun
    20. RB Adverb
    21. RBR Adverb, comparative
    22. RBS Adverb, superlative
    23. RP Particle
    24. SYM Symbol
    25. TO to
    26. UH Interjection
    27. VB Verb, base form
    28. VBD Verb, past tense
    29. VBG Verb, gerund or present participle
    30. VBN Verb, past participle
    31. VBP Verb, non­3rd person singular present
    32. VBZ Verb, 3rd person singular present
    33. WDT Wh­determiner
    34. WP Wh­pronoun
    35. WP$ Possessive wh­pronoun
    36. WRB Wh­adverb
    0 讨论(0)
  • 2020-11-27 09:37

    The accepted answer above is missing the following information:

    There are also 9 punctuation tags defined (which are not listed in some references, see here). These are:

    1. #
    2. $
    3. '' (used for all forms of closing quote)
    4. ( (used for all forms of opening parenthesis)
    5. ) (used for all forms of closing parenthesis)
    6. ,
    7. . (used for all sentence-ending punctuation)
    8. : (used for colons, semicolons and ellipses)
    9. `` (used for all forms of opening quote)
    0 讨论(0)
  • 2020-11-27 09:37

    Stanford CoreNLP Tags for Other Languages : French, Spanish, German ...

    I see you use the parser for English language, which is the default model. You may use the parser for other languages (French, Spanish, German ...) and, be aware, both tokenizers and part of speech taggers are different for each language. If you want to do that, you must download the specific model for the language (using a builder like Maven for example) and then set the model you want to use. Here you have more information about that.

    Here you are lists of tags for different languages :

    1. Stanford CoreNLP POS Tags for Spanish
    2. Stanford CoreNLP POS Tagger for German uses the Stuttgart-Tübingen Tag Set (STTS)
    3. Stanford CoreNLP POS tagger for French uses the following tags:

    TAGS FOR FRENCH:

    Part of Speech Tags for French

    A     (adjective)
    Adv   (adverb)
    CC    (coordinating conjunction)
    Cl    (weak clitic pronoun)
    CS    (subordinating conjunction)
    D     (determiner)
    ET    (foreign word)
    I     (interjection)
    NC    (common noun)
    NP    (proper noun)
    P     (preposition)
    PREF  (prefix)
    PRO   (strong pronoun)
    V     (verb)
    PONCT (punctuation mark)
    

    Phrasal Categories Tags for French:

    AP     (adjectival phrases)
    AdP    (adverbial phrases)
    COORD  (coordinated phrases)
    NP     (noun phrases)
    PP     (prepositional phrases)
    VN     (verbal nucleus)
    VPinf  (infinitive clauses)
    VPpart (nonfinite clauses)
    SENT   (sentences)
    Sint, Srel, Ssub (finite clauses)
    

    Syntactic Functions for French:

    SUJ    (subject)
    OBJ    (direct object)
    ATS    (predicative complement of a subject)
    ATO    (predicative complement of a direct object)
    MOD    (modifier or adjunct)
    A-OBJ  (indirect complement introduced by à)
    DE-OBJ (indirect complement introduced by de)
    P-OBJ  (indirect complement introduced by another preposition)
    
    0 讨论(0)
提交回复
热议问题