Font cannot be extracted by PDFMiner

问题

I am converting some pdf reports to plain text using PDFMiner and a bunch of my input pdfs just come out with a couple of recognised lines and then a list of (cid:%d) a little like this...

Inspection report

(cid:4)(cid:5)(cid:6)(cid:7)(cid:8)(cid:9) (cid:10)(cid:9)(cid:11)(cid:9)(cid:12)(cid:9)(cid:5)(cid:13)(cid:9) (cid:14)(cid:8)(cid:15)(cid:16)(cid:9)(cid:12) (cid:17)(cid:18)(cid:13)(cid:19)(cid:20) (cid:21)(cid:8)(cid:22)(cid:23)(cid:18)(cid:12)(cid:6)(cid:22)(cid:24) (cid:25)(cid:5)(cid:26)(cid:27)(cid:9)(cid:13)(cid:22)(cid:6)(cid:18)(cid:5) (cid:5)(cid:8)(cid:15)(cid:16)(cid:9)(cid:12)

Checking it out I think the problem is the bulk of the document is in a font that is resisting being extracted. Debugging the problem has been kind of strange because the font seemed to change over night (don't ask how, it just did).

I'm not sure what might be significant but today the font has properties:

name = 'font0000000018f29a3e' - cidcoding = 'Adobe-Identity'- unicode_map = 'UnicodeMap: /Adobe-Identity-UCS' - unicode_map.cid2unichr = {}

I'm using 2.7 on a mac and have tried a few things