Multi-term named entities in Stanford Named Entity Recognizer

前端 未结 8 1356
春和景丽
春和景丽 2021-01-31 19:33

I\'m using the Stanford Named Entity Recognizer http://nlp.stanford.edu/software/CRF-NER.shtml and it\'s working fine. This is

    List&         


        
相关标签:
8条回答
  • 2021-01-31 20:21
    List<List<CoreLabel>> out = classifier.classify(text);
    for (List<CoreLabel> sentence : out) {
        String s = "";
        String prevLabel = null;
        for (CoreLabel word : sentence) {
          if(prevLabel == null  || prevLabel.equals(word.get(CoreAnnotations.AnswerAnnotation.class)) ) {
             s = s + " " + word;
             prevLabel = word.get(CoreAnnotations.AnswerAnnotation.class);
          }
          else {
            if(!prevLabel.equals("O"))
               System.out.println(s.trim() + '/' + prevLabel + ' ');
            s = " " + word;
            prevLabel = word.get(CoreAnnotations.AnswerAnnotation.class);
          }
        }
        if(!prevLabel.equals("O"))
            System.out.println(s + '/' + prevLabel + ' ');
    }
    

    I just wrote a small logic and it's working fine. what I did is group words with same label if they are adjacent.

    0 讨论(0)
  • 2021-01-31 20:26

    The counterpart of the classifyToCharacterOffsets method is that (AFAIK) you can't access the label of the entities.

    As proposed by Christopher, here is an example of a loop which assembles "adjacent non-O things". This example also counts the number of occurrences.

    public HashMap<String, HashMap<String, Integer>> extractEntities(String text){
    
        HashMap<String, HashMap<String, Integer>> entities =
                new HashMap<String, HashMap<String, Integer>>();
    
        for (List<CoreLabel> lcl : classifier.classify(text)) {
    
            Iterator<CoreLabel> iterator = lcl.iterator();
    
            if (!iterator.hasNext())
                continue;
    
            CoreLabel cl = iterator.next();
    
            while (iterator.hasNext()) {
                String answer =
                        cl.getString(CoreAnnotations.AnswerAnnotation.class);
    
                if (answer.equals("O")) {
                    cl = iterator.next();
                    continue;
                }
    
                if (!entities.containsKey(answer))
                    entities.put(answer, new HashMap<String, Integer>());
    
                String value = cl.getString(CoreAnnotations.ValueAnnotation.class);
    
                while (iterator.hasNext()) {
                    cl = iterator.next();
                    if (answer.equals(
                            cl.getString(CoreAnnotations.AnswerAnnotation.class)))
                        value = value + " " +
                               cl.getString(CoreAnnotations.ValueAnnotation.class);
                    else {
                        if (!entities.get(answer).containsKey(value))
                            entities.get(answer).put(value, 0);
    
                        entities.get(answer).put(value,
                                entities.get(answer).get(value) + 1);
    
                        break;
                    }
                }
    
                if (!iterator.hasNext())
                    break;
            }
        }
    
        return entities;
    }
    
    0 讨论(0)
提交回复
热议问题