I am trying to use Stanford NLP tool to extract dates ( 8/11/2012 ) form text.
Here\'s a link! for the demo of this tool
Can u help me in how to train the c
in the sutime/english.sutime.txt line 319, there are few patterns for US tagging:
{ ruleType: "time", pattern: /yyyy-?MM-?dd-?'T'HH(:?mm(:?ss([.,]S{1,3})?)?)?(Z)?/ }
{ ruleType: "time", pattern: /yyyy-MM-dd/ }
{ ruleType: "time", pattern: /'T'HH(:?mm(:?ss(.,)?)?)?(Z)?/ }
// Tokenizer "sometimes adds extra slash
{ ruleType: "time", pattern: /yyyy\?/MM\?/dd/ }
{ ruleType: "time", pattern: /MM?\?/dd?\?/(yyyy|yy)/ }
{ ruleType: "time", pattern: /MM?-dd?-(yyyy|yy)/ }
{ ruleType: "time", pattern: /HH?:mm(:ss)?/ }
{ ruleType: "time", pattern: /yyyy-MM/ }
just need to add few ruleTypes, to get it the needed order
Using the NLP tool to extract dates from text seems like overkill if this is all you are trying to accomplish. You should consider other options like a simple Java regular expression (eg. here).
If you are doing something that requires more features from the Stanford NLP tool, take a look at the SUTime annotator. Their demo page will let you get a feel for how it behaves. Make sure to check the option Read rules from file
and you will see that your date gets annotated.
Usage:
SUTime annotations are provided automatically with the StanfordCoreNLP pipeline by including the ner annotator.
You can certainly train the CRF-based NER to recognize dates and times. You can see an example of that by running the supplied english.muc.7class.distsim.crf.ser.gz model. See the FAQ for training NER systems. But note that our primary tool for time/date recognition is now regex based: SUTime. You can also write rules for SUTime for other applications. See the SUTime page and the link to TokensRegex on that page.