Formatting NER output from Stanford Corenlp

前端 未结 4 1838
耶瑟儿~
耶瑟儿~ 2021-01-15 16:22

I am working with Stanford CoreNLP and using it for NER. But when I extract organization names, I see that each word is tagged with the annotation. So, if the entity is \"NE

4条回答
  •  一向
    一向 (楼主)
    2021-01-15 17:22

    From Stanford CoreNLP 3.6 and onwards, You can use entitymentions in Pipeline and get list of all Entities. I have shown an example here. It works.

    Properties props = new Properties();
    props.put("annotators", "tokenize, ssplit, pos, lemma, ner, regexner,entitymentions");
    props.put("regexner.mapping", "jg-regexner.txt");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    
    
    String inputText = "I have done Bachelor of Arts and Bachelor of Laws so that I can work at British Broadcasting Corporation"; 
    Annotation annotation = new Annotation(inputText);
    
    pipeline.annotate(annotation); 
    
    List multiWordsExp = annotation.get(MentionsAnnotation.class);
    for (CoreMap multiWord : multiWordsExp) {
          String custNERClass = multiWord.get(NamedEntityTagAnnotation.class);
          System.out.println(multiWord +" : " +custNERClass);
    }
    

提交回复
热议问题