I am looking on standford corenlp using the Named Entity REcognizer.I have different kinds of input text and i need to tag it into my own Entity.So i started training my ow
The NERClassifier* is word level, that is, it labels words, not phrases. Given that, the classifier seems to be performing fine. If you want, you can hyphenate words that form phrases. So in your labeled examples and in your test examples, you would make "Land Cruiser" to "Land_Cruiser".
I believe you should also put examples of 0
entities in your trainFile
. As you gave it, the trainFile
is just too simple for the learning to be done, it needs both 0
and PERSON
examples so it doesn't annotate everything as PERSON
. You're not teaching it about your not-of-interest entities. Say, like this:
Toyota PERS
of 0
Portfolio 0
49 0
and so on.
Also, for phrase-level recognition you should look into regexner
, where you can have patterns (patterns are good for us). I'm working on this with the API
and I have the following code:
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, regexner");
props.put("regexner.mapping", customLocationFilename);
with the following customLocationFileName
:
Make Believe Town figure of speech ORGANIZATION
( /Hello/ [{ ner:PERSON }]+ ) salut PERSON
Bachelor of (Arts|Laws|Science|Engineering) DEGREE
( /University/ /of/ [{ ner:LOCATION }] ) SCHOOL
and text: Hello Mary Keller was born on 4th of July and took a Bachelor of Science. Partial invoice (€100,000, so roughly 40%) for the consignment C27655 we shipped on 15th August to University of London from the Make Believe Town depot. INV2345 is for the balance.. Customer contact (Sigourney Weaver) says they will pay this on the usual credit terms (30 days).
The output I get
Hello Mary Keller is a salut
4th of July is a DATE
Bachelor of Science is a DEGREE
$ 100,000 is a MONEY
40 % is a PERCENT
15th August is a DATE
University of London is a ORGANIZATION
Make Believe Town is a figure of speech
Sigourney Weaver is a PERSON
30 days is a DURATION
For more info on how to do this you can look at the example that got me going.