问题
I am using OpenNLP models for Name-entity recognition.
I am passing sentences, in which I want to identify words. Open NLP requires a String [] variable, hence I split my String into words separated by space.
I am facing the problem to recognize the Date. If for example the string contains the date: 7 Jan 2012 and I split the string into words, "7", "Jan" and "2012" get separated as 3 different words. Although they are recognized as dates but the 3 different tokens don't make sense for me for further processing. How can I possibly split my string, so that "2 Jan 2012" can be taken as one string... 7 Jan 2012 is one format... Sometimes it is also Jan 7,2012. Date also recognizes the time format I input: like 12:18pm
The NER time model is does not recognize the time in 12:18pm or 09:52:52 .. What kind of time format does it accept?
回答1:
Apache OpenNLP date and time model are statistical, trained from a corpus. It will recognize date and time from the context, not only from the format.
If you have specific needs you can create your own corpus and train your own OpenNLP Name Finder model.
OpenNLP Name Finder also supports some customization while training. Maybe if you create a corpus, and also add some regex based features you can improve your results.
来源:https://stackoverflow.com/questions/10419337/opennlp-name-entity-recognition-model-for-time-and-date