Finding a DOI in a document or page

前端 未结 7 1832
悲&欢浪女
悲&欢浪女 2021-01-29 21:43

The DOI system places basically no useful limitations on what constitutes a reasonable identifier. However, being able to pull DOIs out of PDFs, web pages, etc. is quite useful

7条回答
  •  礼貌的吻别
    2021-01-29 22:08

    Here is my go at it:

    (10[.][0-9]{4,}[^\s"/<>]*/[^\s"<>]+)
    

    And a couple of valid edge cases where this doesn't fail, but others seem to do:

    • 10.1007/978-3-642-28108-2_19
    • 10.1007.10/978-3-642-28108-2_19 (fictitious example, see @Ju9OR comment)
    • 10.1016/S0735-1097(98)00347-7
    • 10.1579/0044-7447(2006)35\[89:RDUICP\]2.0.CO;2

    Also, correctly discards some falsy (X|HT)ML stuff like:

提交回复
热议问题