Calculate correct width of a text

前端 未结 1 365
面向向阳花
面向向阳花 2021-01-15 03:07

I need to read a plan exported by AutoCAD to PDF and place some markers with text on it with PDFBox. Everything works fine, except the calculation of the width of the text,

相关标签:
1条回答
  • 2021-01-15 03:34

    Unfortunately the question and comments merely include (by running the sample project) the actual result for two source documents and the description

    The annotating text should be center aligned on the top and bottom marker, aligned to the left on the right marker and aligned to the right on the left marker. The alignment is not working for me, as the font.getSTringWidth( .. ) returns only a fraction of what it seems to be. And the discrepance seems to be different in both PDFs.

    but not a concrete sample discrepancy to repair.

    There are several issues in the code, though, which may lead to such observations (and other ones, too!). Fixing them should be done first; this may already resolve the issues observed by the OP.

    Which box to take

    The code of the OP derives several values from the media box:

    PDRectangle pageSize = page.findMediaBox();
    float pageWidth = pageSize.getWidth();
    float pageHeight = pageSize.getHeight();
    float lineWidth = Math.max(pageWidth, pageHeight) / 1000;
    float markerRadius = lineWidth * 10;
    float fontSize = Math.min(pageWidth, pageHeight) / 20;
    float fontPadding = Math.max(pageWidth, pageHeight) / 100;
    

    These seem to be chosen to be optically pleasing in relation to the page size. But the media box is not, in general, the final displayed or printed page size, the crop box is. Thus, it should be

    PDRectangle pageSize = page.findCropBox();
    

    (Actually the trim box, the intended dimensions of the finished page after trimming, might even be more apropos; the trim box defaults to the crop box. For details read here.)

    This is not relevant for the given sample documents as they do not contain explicit crop box definitions, so the crop box defaults to the media box. It might be relevant for other documents, though, e.g. those the OP could not include.

    Which PDPageContentStream constructor to use

    The code of the OP adds a content stream to the page at hand using this constructor:

    PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true);
    

    This constructor appends (first true) and compresses (second true) but unfortunately it continues in the graphics state left behind by the pre-existing content.

    Details of the graphics state of importance for the observations at hand:

    • Transformation matrix - it may have been changed to scale (or rotate, skew, move ...) any new content added
    • Character spacing - it may have been changed to put any new characters added nearer to or farther from each other
    • Word spacing - it may have been changed to put any new words added nearer to or farther from each other
    • Horizontal scaling - it may have been changed to scale any new characters added
    • Text rise - it may have been changed to displace any new characters added vertically

    Thus, a constructor should be chosen which also resets the graphics state:

    PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true, true);
    

    The third true tells PDFBox to reset the graphics state, i.e. to surround the former content with a save-state/restore-state operator pair.

    This is relevant for the given sample documents, at least the transformation matrix is changed.

    Setting and using the CalRGB color space

    The OP's code sets the stroking and non-stroking color spaces to a calibrated color space:

    contentStream.setStrokingColorSpace(new PDCalRGB());
    contentStream.setNonStrokingColorSpace(new PDCalRGB());
    

    Unfortunately new PDCalRGB() does not create a valid CalRGB color space object, its required WhitePoint value is missing. Thus, before selecting a calibrated color space, initialize it properly.

    Thereafter the OP's code sets the colors using

    contentStream.setStrokingColor(marker.color.r, marker.color.g, marker.color.b);
    contentStream.setNonStrokingColor(marker.color.r, marker.color.g, marker.color.b);
    

    These (int, int, int) overloads unfortunately use the RG and rg operators implicitly selecting the DeviceRGB color space. To not overwrite the current color space, use the (float[]) overloads with normalized (0..1) values instead.

    While this is not relevant for the observed issue, it causes error messages by PDF viewers.

    Calculating the width of a drawn string

    The OP's code calculates the width of a drawn string using

    float textWidth = font.getStringWidth(marker.id) * 0.043f;
    

    and the OP is surprised

    The * 0.043f works as an approximation for one document, but fails for the next.

    There are two factors building this "magic" number:

    • As the OP has remarked the glyph coordinate space is set up in a 1/1000 of the user coordinate space and that number is in glyph space, thus a factor of 0.001.

    • As the OP has ignored he wants the width for the string using the font size he selected. But the font object has no knowledge of the current font size and returns the width for a font size of 1. As the OP selects the font size dynamically as Math.min(pageWidth, pageHeight) / 20, this factor varies. In case of the two given sample documents about 42 but probably totally different in other documents.

    Positioning text

    The OP's code positions the text like this starting from identity text matrices:

    contentStream.moveTextPositionByAmount(
        marker.endX + marker.getXTextOffset(textWidth, fontPadding),
        marker.endY + marker.getYTextOffset(fontSize, fontPadding));
    

    using methods getXTextOffset and getYTextOffset:

    public float getXTextOffset(float textWidth, float fontPadding) {
        if (getLocation() == Location.TOP)
            return (textWidth / 2 + fontPadding) * -1;
        else if (getLocation() == Location.BOTTOM)
            return (textWidth / 2 + fontPadding) * -1;
        else if (getLocation() == Location.RIGHT)
            return 0 + fontPadding;
        else
            return (textWidth + fontPadding) * -1;
    }
    
    public float getYTextOffset(float fontSize, float fontPadding) {
        if (getLocation() == Location.TOP)
            return 0 + fontPadding;
        else if (getLocation() == Location.BOTTOM)
            return (fontSize + fontPadding) * -1f;
        else
            return fontSize / 2 * -1;
    }
    

    In case of getXTextOffset I doubt that adding fontPadding for Location.TOP and Location.BOTTOM makes sense, especially in the light of the OP's desire

    The annotating text should be center aligned on the top and bottom marker
    

    For the text to be centered it should not be shifted off-center.

    The case of getYTextOffset is more difficult. The OP's code is built upon two misunderstandings: It assumes

    • that the text position selected by moveTextPositionByAmount is the lower left, and
    • that the font size is the character height.

    Actually the text position is positioned on the base line, the glyph origin of the next drawn glyph will be positioned there, e.g.

    Glyph origin, width, and bounding box for 'g'

    Thus, the y positioned either has to be corrected to take the descent into account (for centering on the whole glyph height) or only use the ascent (for centering on the above-baseline glyph height).

    And a font size does not denote the actual character height but is arranged so that the nominal height of tightly spaced lines of text is 1 unit for font size 1. "Tightly spaced" implies that some small amount of additional inter-line space is contained in the font size.

    In essence for centering vertically one has to decide what to center on, whole height or above-baseline height, first letter only, whole label, or all font glyphs. PDFBox does not readily supply the necessary information for all cases but methods like PDFont.getFontBoundingBox() should help.

    0 讨论(0)
提交回复
热议问题