Date Extraction from Text

前端 未结 3 1493
暖寄归人
暖寄归人 2021-01-19 04:54

I am trying to use Stanford NLP tool to extract dates ( 8/11/2012 ) form text.

Here\'s a link! for the demo of this tool

Can u help me in how to train the c

相关标签:
3条回答
  • 2021-01-19 04:58

    in the sutime/english.sutime.txt line 319, there are few patterns for US tagging:

    { ruleType: "time", pattern: /yyyy-?MM-?dd-?'T'HH(:?mm(:?ss([.,]S{1,3})?)?)?(Z)?/ } 
    { ruleType: "time", pattern: /yyyy-MM-dd/ }  
    { ruleType: "time", pattern: /'T'HH(:?mm(:?ss(.,)?)?)?(Z)?/ } 
    // Tokenizer "sometimes adds extra slash  
    { ruleType: "time", pattern: /yyyy\?/MM\?/dd/ }  
    { ruleType: "time", pattern: /MM?\?/dd?\?/(yyyy|yy)/ } 
    { ruleType: "time", pattern: /MM?-dd?-(yyyy|yy)/ } 
    { ruleType: "time", pattern: /HH?:mm(:ss)?/ }
    { ruleType: "time", pattern: /yyyy-MM/ }
    

    just need to add few ruleTypes, to get it the needed order

    0 讨论(0)
  • 2021-01-19 05:05

    Using the NLP tool to extract dates from text seems like overkill if this is all you are trying to accomplish. You should consider other options like a simple Java regular expression (eg. here).

    If you are doing something that requires more features from the Stanford NLP tool, take a look at the SUTime annotator. Their demo page will let you get a feel for how it behaves. Make sure to check the option Read rules from file and you will see that your date gets annotated.

    Usage:

    SUTime annotations are provided automatically with the StanfordCoreNLP pipeline by including the ner annotator.
    
    0 讨论(0)
  • 2021-01-19 05:12

    You can certainly train the CRF-based NER to recognize dates and times. You can see an example of that by running the supplied english.muc.7class.distsim.crf.ser.gz model. See the FAQ for training NER systems. But note that our primary tool for time/date recognition is now regex based: SUTime. You can also write rules for SUTime for other applications. See the SUTime page and the link to TokensRegex on that page.

    0 讨论(0)
提交回复
热议问题