The DOI system places basically no useful limitations on what constitutes a reasonable identifier. However, being able to pull DOIs out of PDFs, web pages, etc. is quite useful
@Silas The sanity checking is a good idea. However, the regex doesn't cover all DOIs. The first element must (currently) be 10, and the second element must (currently) be numeric, but the third element is barely restricted at all:
"Legal characters are the legal graphic characters of Unicode. This specifically excludes the control character ranges 0x00-0x1F and 0x80-0x9F..."
and that's where the real problem lies. In practice, I've never seen whitespace used, but the spec specifically allows for it. Basically, there doesn't seem to be a sensible way of detecting the end of a DOI.