Is it possible to train Stanford NER system to recognize more named entities types?

前端 未结 3 605
小鲜肉
小鲜肉 2021-01-30 09:32

I\'m using some NLP libraries now, (stanford and nltk) Stanford I saw the demo part but just want to ask if it possible to use it to identify more entity types.

So curr

3条回答
  •  傲寒
    傲寒 (楼主)
    2021-01-30 09:53

    Seems you want to train your custom NER model.

    Here is a detailed tutorial with full code:

    https://dataturks.com/blog/stanford-core-nlp-ner-training-java-example.php?s=so

    Training data format

    Training data is passed as a text file where each line is one word-label pair. Each word in the line should be labeled in a format like "word\tLABEL", the word and the label name is separated by a tab '\t'. For a text sentence, we should break it down into words and add one line for each word in the training file. To mark the start of the next line, we add an empty line in the training file.

    Here is a sample of the input training file:

    hp  Brand
    spectre ModelName
    x360    ModelName
    
    home    Category
    theater Category
    system  0
    
    horizon ModelName
    zero    ModelName
    dawn    ModelName
    ps4 0
    

    Depending upon your domain, you can build such a dataset either automatically or manually. Building such a dataset manually can be really painful, tools like a NER annotation tool can help make the process much easier.

    Train model

    public void trainAndWrite(String modelOutPath, String prop, String trainingFilepath) {
       Properties props = StringUtils.propFileToProperties(prop);
       props.setProperty("serializeTo", modelOutPath);
    
       //if input use that, else use from properties file.
       if (trainingFilepath != null) {
           props.setProperty("trainFile", trainingFilepath);
       }
    
       SeqClassifierFlags flags = new SeqClassifierFlags(props);
       CRFClassifier crf = new CRFClassifier<>(flags);
       crf.train();
    
       crf.serializeClassifier(modelOutPath);
    }
    

    Use the model to generate tags:

    public void doTagging(CRFClassifier model, String input) {
        input = input.trim();
        System.out.println(input + "=>"  +  model.classifyToString(input));
    }  
    

    Hope this helps.

提交回复
热议问题