I need to read a plan exported by AutoCAD to PDF and place some markers with text on it with PDFBox. Everything works fine, except the calculation of the width of the text,
Unfortunately the question and comments merely include (by running the sample project) the actual result for two source documents and the description
The annotating text should be center aligned on the top and bottom marker, aligned to the left on the right marker and aligned to the right on the left marker. The alignment is not working for me, as the font.getSTringWidth( .. ) returns only a fraction of what it seems to be. And the discrepance seems to be different in both PDFs.
but not a concrete sample discrepancy to repair.
There are several issues in the code, though, which may lead to such observations (and other ones, too!). Fixing them should be done first; this may already resolve the issues observed by the OP.
The code of the OP derives several values from the media box:
PDRectangle pageSize = page.findMediaBox();
float pageWidth = pageSize.getWidth();
float pageHeight = pageSize.getHeight();
float lineWidth = Math.max(pageWidth, pageHeight) / 1000;
float markerRadius = lineWidth * 10;
float fontSize = Math.min(pageWidth, pageHeight) / 20;
float fontPadding = Math.max(pageWidth, pageHeight) / 100;
These seem to be chosen to be optically pleasing in relation to the page size. But the media box is not, in general, the final displayed or printed page size, the crop box is. Thus, it should be
PDRectangle pageSize = page.findCropBox();
(Actually the trim box, the intended dimensions of the finished page after trimming, might even be more apropos; the trim box defaults to the crop box. For details read here.)
This is not relevant for the given sample documents as they do not contain explicit crop box definitions, so the crop box defaults to the media box. It might be relevant for other documents, though, e.g. those the OP could not include.
The code of the OP adds a content stream to the page at hand using this constructor:
PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true);
This constructor appends (first true
) and compresses (second true
) but unfortunately it continues in the graphics state left behind by the pre-existing content.
Details of the graphics state of importance for the observations at hand:
Thus, a constructor should be chosen which also resets the graphics state:
PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true, true);
The third true
tells PDFBox to reset the graphics state, i.e. to surround the former content with a save-state/restore-state operator pair.
This is relevant for the given sample documents, at least the transformation matrix is changed.
The OP's code sets the stroking and non-stroking color spaces to a calibrated color space:
contentStream.setStrokingColorSpace(new PDCalRGB());
contentStream.setNonStrokingColorSpace(new PDCalRGB());
Unfortunately new PDCalRGB()
does not create a valid CalRGB color space object, its required WhitePoint value is missing. Thus, before selecting a calibrated color space, initialize it properly.
Thereafter the OP's code sets the colors using
contentStream.setStrokingColor(marker.color.r, marker.color.g, marker.color.b);
contentStream.setNonStrokingColor(marker.color.r, marker.color.g, marker.color.b);
These (int, int, int)
overloads unfortunately use the RG and rg operators implicitly selecting the DeviceRGB color space. To not overwrite the current color space, use the (float[])
overloads with normalized (0..1) values instead.
While this is not relevant for the observed issue, it causes error messages by PDF viewers.
The OP's code calculates the width of a drawn string using
float textWidth = font.getStringWidth(marker.id) * 0.043f;
and the OP is surprised
The * 0.043f works as an approximation for one document, but fails for the next.
There are two factors building this "magic" number:
As the OP has remarked the glyph coordinate space is set up in a 1/1000 of the user coordinate space and that number is in glyph space, thus a factor of 0.001.
As the OP has ignored he wants the width for the string using the font size he selected. But the font object has no knowledge of the current font size and returns the width for a font size of 1. As the OP selects the font size dynamically as Math.min(pageWidth, pageHeight) / 20
, this factor varies. In case of the two given sample documents about 42 but probably totally different in other documents.
The OP's code positions the text like this starting from identity text matrices:
contentStream.moveTextPositionByAmount(
marker.endX + marker.getXTextOffset(textWidth, fontPadding),
marker.endY + marker.getYTextOffset(fontSize, fontPadding));
using methods getXTextOffset
and getYTextOffset
:
public float getXTextOffset(float textWidth, float fontPadding) {
if (getLocation() == Location.TOP)
return (textWidth / 2 + fontPadding) * -1;
else if (getLocation() == Location.BOTTOM)
return (textWidth / 2 + fontPadding) * -1;
else if (getLocation() == Location.RIGHT)
return 0 + fontPadding;
else
return (textWidth + fontPadding) * -1;
}
public float getYTextOffset(float fontSize, float fontPadding) {
if (getLocation() == Location.TOP)
return 0 + fontPadding;
else if (getLocation() == Location.BOTTOM)
return (fontSize + fontPadding) * -1f;
else
return fontSize / 2 * -1;
}
In case of getXTextOffset
I doubt that adding fontPadding
for Location.TOP
and Location.BOTTOM
makes sense, especially in the light of the OP's desire
The annotating text should be center aligned on the top and bottom marker
For the text to be centered it should not be shifted off-center.
The case of getYTextOffset
is more difficult. The OP's code is built upon two misunderstandings: It assumes
moveTextPositionByAmount
is the lower left, andActually the text position is positioned on the base line, the glyph origin of the next drawn glyph will be positioned there, e.g.
Thus, the y positioned either has to be corrected to take the descent into account (for centering on the whole glyph height) or only use the ascent (for centering on the above-baseline glyph height).
And a font size does not denote the actual character height but is arranged so that the nominal height of tightly spaced lines of text is 1 unit for font size 1. "Tightly spaced" implies that some small amount of additional inter-line space is contained in the font size.
In essence for centering vertically one has to decide what to center on, whole height or above-baseline height, first letter only, whole label, or all font glyphs. PDFBox does not readily supply the necessary information for all cases but methods like PDFont.getFontBoundingBox()
should help.