Converting a PDF to JPG with ImageMagick in PHP Gives Odd Letter Spacing

一曲冷凌霜 提交于 2021-01-27 07:27:03

问题


I am trying to convert a PDF to a JPG with a PHP exec() call, which looks like this:

convert page.pdf -resize 716x716 page.jpg

For some reason, the JPG comes out with janky text, despite the PDF looking just fine in Acrobat and Mac Preview. Here is the original PDF:

http://whit.info/dev/conversion/page.pdf

and here is the janktastic output:

http://whit.info/dev/conversion/page.jpg

The server is a LAMP stack with PHP 5 and ImageMagick 6.2.8.

Can you help this stumped Geek?

Thanks in advance,

Whit


回答1:


ImageMagick is just going to call out to Ghostscript to convert this PDF to an image. If you run gs on the pdf, you get the same badly-spaced output.

I suspect Ghostscript isn't handling the PDF's embedded TrueType fonts very well. If you could change your output to either embed Type 1 fonts or use a "core" PostScript font, you'd get better results.




回答2:


I suspect its an encoding/widths issue. Both are a tad off, though I can't put my finger on why.

Here are some suspects:

First

The text stream is defined in UTF-16 LE. charNULLcharNULL, using the normal string drawing command syntax:

(some text) Tj

There's a way to escape any old character value into a () string. You can also define strings in hex thusly:

<203245> Tj

Neither method are used, just the questionable inline nulls. That could cause an issue in GS if it's trying to work with pointers to char without lengths associated with them.

Second

The widths array is dumb. You can define widths in groups thusly:

[ 32 [450 525 500] 37 [600 250] 40 [0] ]

This defines
32: 450
33: 525
34: 500
37: 600
38: 250
40: 0

These fonts defines their consecutive widths in individual arrays. Not illegal, but definitely wasteful/stupid, and if GS were coded to EXPECT gaps between the arrays, it could induce a bug.

There's also some extremely fishy values in the array. 32 through 126 are defined consecutively, but then it starts jumping all over: ...126 [600] 8364 [500] 8216 [222] 402 [500] 8222 [389]. 8230 [1000] 8224 [444]... and then goes back to being consecutive from 160 to 255.

Just weird.

Third

I'm not even remotely sure, but the CIDToGIDMap stream contains an AWEFUL lot of nulls.

Bottom line

Those fonts are fishy. And I've never heard of "Bellflower Books" or "UFPDF 0.1"

That version number makes me cringe. It should make you cringe too.

Googleing for "UFPDF" I found this note from the author:

Note: I wrote UFPDF as an experiment, not as a finished product. If you have problems using it, don't bug me for support. Patches are welcome though, but I don't have much time to maintain this.

UFPDF is a PHP library that sits on top of FPDF. 0.1. Just run away.



来源:https://stackoverflow.com/questions/5268909/converting-a-pdf-to-jpg-with-imagemagick-in-php-gives-odd-letter-spacing

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!