The DOI system places basically no useful limitations on what constitutes a reasonable identifier. However, being able to pull DOIs out of PDFs, web pages, etc. is quite useful
I'm sure it's not super-helpful for the OP at this point, but I figured I'd post what I am trying in case anyone else like me stumbles upon this:
(10.(\d)+/(\S)+)
This matches: "10 dot number slash anything-not-whitespace"
But for my use (scraping HTML), this was finding false-positives, so I had to match the above, plus get rid of quotes and greater-than/less-than:
(10.(\d)+/([^(\s\>\"\<)])+)
I'm still testing these out, but I'm feeling hopeful thus far.