Text rotation in PDF

谁说我不能喝 提交于 2019-12-11 06:34:09

问题


So I have this situation:

using pdftoxml.exe from sourceforge.net I got text tokens and their coordinates. If the pdf file was rotated (i.e. it has a /Rotate 90 written in its source) pdftoxml.exe swaps height and width of a given page and also x and y coordinates of any given object. That is what I understand.

I was happy with it, until I came across a pdf file which used re to draw thick lines. That is, for a thick line, 4 thin lines are drawn and the space is filled, like in this picture. On the left you see two thin lines (non colored), which are part of a bigger rectangle (highly zoomed in). I emptied the space inbetween which was actually filled with black, to see the lines:

Additionally, above pdf is rotated. So to get B upright in the end, this textmatrix was used: 0 1 -1 0 90.72 28.3705 Tm. The thin lines were drawn like this from 83.04 27.891 0.48 0.48 re (coordinates may vary here, but it was some re operation like that. The operation goes like x y width height re and re is for rectangle from adobe's pdf 1.7 page 133). What is relevant here is the calculation 27.891 + 0.48 = 28.371 which is not rounded or altered because of floating-point issues. It is the exact value for the line's x and unfortunately, it is bigger than the hard coded B's x which is 28.3705 :

83.52 27.891 m 92.39999999999999 27.891 l s

92.39999999999999 27.891 m 92.39999999999999 28.371 l s

92.39999999999999 28.371 m 83.52 28.371 l s

83.52 28.371 m 83.52 27.891 l s

The page's coordinates go like 842 x 595,2 according to PDFXChange viewer from upper left corner. Which seems natural since the page is rotated. Unrotated, it would be the lower left corner, so that ought to be ok.


When the text is altered with 1 0 0 1 90.72 28.3705 Tm into its original orientation, one can see the collapsing bottom line with the line on the left:

which is what I would expect, since B 's y is 28.3705 and and the line's horizontal position is 28.371 (as can be seen on the second line of above code lines). So probabyly B's bottom line falls beyond the 28.371 but I could not zoom that.

Now where does the gap between the line and the B come from in the first picture? This is important to me because I was trying to figure out which is the closest line on the left to B and was surprised by the two values, namely the suppsed x value of the text I get from pdftoxml.exe which is 28.3705 and the lines horizontal value 28.371. Since I knew the line is actually far beyond the left of the B that could not be correct, at least not in the sense of "take x position of line, take x position of B, compare, and if the line's x is less then than B's x, the line is on the left".

I can't locate the correct line with the x values. Instead I get the other line on the very left...like as if the text was falling inbetween them two.

This is the text drawing code:

BT
%0 7.5 -7.5 0 90.72 28.3705 Tm
0 1 -1 0 90.72 28.3705 Tm
%1 0 0 1 90.72 28.3705 Tm
/F1 1 Tf
1 Tr
q
0.01 w
(B) Tj
Q
ET

so, there is nothing fancy happening with the B's size or line thickness.

Can you help me figure out?


This is an updated picture with two I drawn on the same page, for the upper I using 0 1 -1 0 90.72 28.3705 Tm (rotated 90 degrees mathematically), for the lower one 1 0 0 1 90.72 28.3705 Tm. So I don't get it, how is the lower I rotated +90 and ends up being the upper one?

Here is the pdf code. It is rather big, but you should be able to copy it into your file and name it sth.pdf.

PDF Sample ( you have to actually zoom into the upper left corner real big to see the I )

EDIT I actually found some interesting information about finding the glyph bounding box, but I could not yet put the pieces together.


回答1:


Please have a look at

The glyph origin is the point (0, 0) in the glyph coordinate system. Tj and other text-showing operators shall position the origin of the first glyph to be painted at the origin of text space.

(shamelessly copied from Figure 39, section 9.2.4 of ISO 32000-1).

As you can see, the coordinates where the glyph is positioned, the glyph origin, is not necessarily where the actual glyph bounding box starts. This may explain the gap in your first image.

Thus, when you are trying to figure out which is the closest line on the left to B optically, it does not suffice to take x position of line, take x position of B, compare, and if the line's x is less then than B's x, the line is on the left, instead you also have to take the font data themselves into account and factor in the gap between glyph origin and glyph bounding box of the glyph represented by B.

For a more in-depth analysis please supply the font data.

EDIT concerning your double-I question... in your comment above you say you actually expected to see a common point - the rotation point - in both I characters, so you can get hands on a reliable horizontal coordinate for the left bounding box side of a character.

Isn't the point where the red lines cross, your rotation point? It should be the glyph origin for both Tj operations, and the I-glyphs have their origins there. Now you can measure from there on.



来源:https://stackoverflow.com/questions/14596106/text-rotation-in-pdf

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!