Date Extraction from Text

前端未结

关注

 3  1495

暖寄归人

I am trying to use Stanford NLP tool to extract dates ( 8/11/2012 ) form text.

Here\'s a link! for the demo of this tool

Can u help me in how to train the c

相关标签:

3条回答

耶瑟儿～

2021-01-19 04:58

in the sutime/english.sutime.txt line 319, there are few patterns for US tagging:

{ ruleType: "time", pattern: /yyyy-?MM-?dd-?'T'HH(:?mm(:?ss([.,]S{1,3})?)?)?(Z)?/ } 
{ ruleType: "time", pattern: /yyyy-MM-dd/ }  
{ ruleType: "time", pattern: /'T'HH(:?mm(:?ss(.,)?)?)?(Z)?/ } 
// Tokenizer "sometimes adds extra slash  
{ ruleType: "time", pattern: /yyyy\?/MM\?/dd/ }  
{ ruleType: "time", pattern: /MM?\?/dd?\?/(yyyy|yy)/ } 
{ ruleType: "time", pattern: /MM?-dd?-(yyyy|yy)/ } 
{ ruleType: "time", pattern: /HH?:mm(:ss)?/ }
{ ruleType: "time", pattern: /yyyy-MM/ }

just need to add few ruleTypes, to get it the needed order

0 讨论(0)

深忆病人

2021-01-19 05:05
Using the NLP tool to extract dates from text seems like overkill if this is all you are trying to accomplish. You should consider other options like a simple Java regular expression (eg. here).

If you are doing something that requires more features from the Stanford NLP tool, take a look at the SUTime annotator. Their demo page will let you get a feel for how it behaves. Make sure to check the option Read rules from file and you will see that your date gets annotated.

Usage:
```
SUTime annotations are provided automatically with the StanfordCoreNLP pipeline by including the ner annotator.
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
旧巷少年郎

2021-01-19 05:12

You can certainly train the CRF-based NER to recognize dates and times. You can see an example of that by running the supplied english.muc.7class.distsim.crf.ser.gz model. See the FAQ for training NER systems. But note that our primary tool for time/date recognition is now regex based: SUTime. You can also write rules for SUTime for other applications. See the SUTime page and the link to TokensRegex on that page.

0 讨论(0)
发布评论:

提交评论
- 加载中...