问题
I am trying to convert a PDF to a JPG with a PHP exec()
call, which looks like this:
convert page.pdf -resize 716x716 page.jpg
For some reason, the JPG comes out with janky text, despite the PDF looking just fine in Acrobat and Mac Preview. Here is the original PDF:
http://whit.info/dev/conversion/page.pdf
and here is the janktastic output:
http://whit.info/dev/conversion/page.jpg
The server is a LAMP stack with PHP 5 and ImageMagick 6.2.8.
Can you help this stumped Geek?
Thanks in advance,
Whit
回答1:
ImageMagick is just going to call out to Ghostscript to convert this PDF to an image. If you run gs
on the pdf, you get the same badly-spaced output.
I suspect Ghostscript isn't handling the PDF's embedded TrueType fonts very well. If you could change your output to either embed Type 1 fonts or use a "core" PostScript font, you'd get better results.
回答2:
I suspect its an encoding/widths issue. Both are a tad off, though I can't put my finger on why.
Here are some suspects:
First
The text stream is defined in UTF-16 LE. charNULLcharNULL, using the normal string drawing command syntax:
(some text) Tj
There's a way to escape any old character value into a () string. You can also define strings in hex thusly:
<203245> Tj
Neither method are used, just the questionable inline nulls. That could cause an issue in GS if it's trying to work with pointers to char without lengths associated with them.
Second
The widths array is dumb. You can define widths in groups thusly:
[ 32 [450 525 500] 37 [600 250] 40 [0] ]
This defines
32: 450
33: 525
34: 500
37: 600
38: 250
40: 0
These fonts defines their consecutive widths in individual arrays. Not illegal, but definitely wasteful/stupid, and if GS were coded to EXPECT gaps between the arrays, it could induce a bug.
There's also some extremely fishy values in the array. 32 through 126 are defined consecutively, but then it starts jumping all over: ...126 [600] 8364 [500] 8216 [222] 402 [500] 8222 [389]. 8230 [1000] 8224 [444]..
. and then goes back to being consecutive from 160 to 255.
Just weird.
Third
I'm not even remotely sure, but the CIDToGIDMap stream contains an AWEFUL lot of nulls.
Bottom line
Those fonts are fishy. And I've never heard of "Bellflower Books" or "UFPDF 0.1"
That version number makes me cringe. It should make you cringe too.
Googleing for "UFPDF" I found this note from the author:
Note: I wrote UFPDF as an experiment, not as a finished product. If you have problems using it, don't bug me for support. Patches are welcome though, but I don't have much time to maintain this.
UFPDF is a PHP library that sits on top of FPDF. 0.1. Just run away.
来源:https://stackoverflow.com/questions/5268909/converting-a-pdf-to-jpg-with-imagemagick-in-php-gives-odd-letter-spacing