Train model using Named entity

前端 未结 2 612
余生分开走
余生分开走 2021-01-07 06:54

I am looking on standford corenlp using the Named Entity REcognizer.I have different kinds of input text and i need to tag it into my own Entity.So i started training my ow

相关标签:
2条回答
  • 2021-01-07 07:04

    The NERClassifier* is word level, that is, it labels words, not phrases. Given that, the classifier seems to be performing fine. If you want, you can hyphenate words that form phrases. So in your labeled examples and in your test examples, you would make "Land Cruiser" to "Land_Cruiser".

    0 讨论(0)
  • 2021-01-07 07:16

    I believe you should also put examples of 0 entities in your trainFile. As you gave it, the trainFile is just too simple for the learning to be done, it needs both 0 and PERSON examples so it doesn't annotate everything as PERSON. You're not teaching it about your not-of-interest entities. Say, like this:

    Toyota  PERS
    of    0
    Portfolio    0
    49    0
    

    and so on.

    Also, for phrase-level recognition you should look into regexner, where you can have patterns (patterns are good for us). I'm working on this with the API and I have the following code:

    Properties props = new Properties();
    props.put("annotators", "tokenize, ssplit, pos, lemma, ner, regexner");
    props.put("regexner.mapping", customLocationFilename);
    

    with the following customLocationFileName:

    Make Believe Town   figure of speech    ORGANIZATION
    ( /Hello/ [{ ner:PERSON }]+ )   salut   PERSON
    Bachelor of (Arts|Laws|Science|Engineering) DEGREE
    ( /University/ /of/ [{ ner:LOCATION }] )    SCHOOL
    

    and text: Hello Mary Keller was born on 4th of July and took a Bachelor of Science. Partial invoice (€100,000, so roughly 40%) for the consignment C27655 we shipped on 15th August to University of London from the Make Believe Town depot. INV2345 is for the balance.. Customer contact (Sigourney Weaver) says they will pay this on the usual credit terms (30 days).

    The output I get

    Hello Mary Keller is a salut
    4th of July is a DATE
    Bachelor of Science is a DEGREE
    $ 100,000 is a MONEY
    40 % is a PERCENT
    15th August is a DATE
    University of London is a ORGANIZATION
    Make Believe Town is a figure of speech
    Sigourney Weaver is a PERSON
    30 days is a DURATION
    

    For more info on how to do this you can look at the example that got me going.

    0 讨论(0)
提交回复
热议问题