Placing an image over text, by using the text postiton in a PDF using PDFBox.

后端 未结 1 1612
轻奢々
轻奢々 2021-01-28 08:29

Result is that image is not placed correctly over text. Am i getting the text positions wrong?

This is an example on how to get the x/y coordinates and size of each char

相关标签:
1条回答
  • 2021-01-28 09:04

    Retrieving sensible coordinates

    You use text.getXDirAdj() and text.getYDirAdj() as x and y coordinates in the content stream. This is won't work because the coordinates PDFBox uses during text extraction are transformed into a coordinate system they prefer for text extraction purposes, cf. the JavaDocs:

    /**
     * This will get the text direction adjusted x position of the character.
     * This is adjusted based on text direction so that the first character
     * in that direction is in the upper left at 0,0.
     *
     * @return The x coordinate of the text.
     */
    public float getXDirAdj()
    
    /**
     * This will get the y position of the text, adjusted so that 0,0 is upper left and it is
     * adjusted based on the text direction.
     *
     * @return The adjusted y coordinate of the character.
     */
    public float getYDirAdj()
    

    For a TextPosition text you should instead use

    text.getTextMatrix().getTranslatex()
    

    and

    text.getTextMatrix().getTranslateY()
    

    But even these numbers may have to be corrected, cf. this answer, because PDFBox has multiplied the matrix by a translation making the lower left corner of the crop box the origin.

    Thus, if PDRectangle cropBox is the crop box of the current page, use

    text.getTextMatrix().getTranslatex() + cropBox.getLowerLeftX()
    

    and

    text.getTextMatrix().getTranslateY() + cropBox.getLowerLeftY()
    

    (This coordinate normalization of PDFBox is a PITA for anyone who actually wants to work with the text coordinates...)

    Other issues

    Your code has some other issues, one of them becoming clear with the document you shared: You append to the page content stream without resetting the graphics context:

    PDPageContentStream contentStream = new PDPageContentStream(pdocument,
            stripper.getCurrentPage(), true, true);
    

    The constructor with this signature assumes you don't want to reset the context. Use the one with an additional boolean parameter and set that to true to request context resets:

    PDPageContentStream contentStream = new PDPageContentStream(pdocument,
            stripper.getCurrentPage(), true, true, true);
    

    Now the context is reset and the position is ok again.

    Both these constructors are deprecated, though, and shouldn't be used for that reason. In the development branch they have been removed already. Instead use

    PDPageContentStream contentStream = new PDPageContentStream(pdocument,
            stripper.getCurrentPage(), AppendMode.APPEND, true, true);
    

    This introduces another issue, though: You create a new PDPageContentStream for each writeString call. If that is done with context reset each time, the nesting of saveGraphicsState/restoreGraphicsState pairs may become pretty deep. Thus, you should only create one such content stream per page and use it in all writeString calls for that page.

    Thus, your text stripper sub-class might look like this:

    class CoverCharByImage extends PDFTextStripper {
        public CoverCharByImage(PDImageXObject pdImage) throws IOException {
            super();
            this.pdImage = pdImage;
        }
    
        final PDImageXObject pdImage;
        PDPageContentStream contentStream = null;
    
        @Override
        public void processPage(PDPage page) throws IOException {
            super.processPage(page);
            if (contentStream != null) {
                contentStream.close();
                contentStream = null;
            }
        }
    
        @Override
        protected void writeString(String string, List<TextPosition> textPositions) throws IOException {
            if (contentStream == null)
                contentStream = new PDPageContentStream(document, getCurrentPage(), AppendMode.APPEND, true, true);
    
            PDRectangle cropBox = getCurrentPage().getCropBox();
    
            for (TextPosition text : textPositions) {
                if (text.getUnicode().equals("a")) {
                    contentStream.drawImage(pdImage, text.getTextMatrix().getTranslateX() + cropBox.getLowerLeftX(),
                            text.getTextMatrix().getTranslateY() + cropBox.getLowerLeftY(),
                            text.getWidthDirAdj(), text.getHeightDir());
                }
            }
        }
    }
    

    (CoverCharacterByImage inner class)

    and it may be used like this:

    PDDocument pdocument = PDDocument.load(...);
    
    String imagePath = ...;
    PDImageXObject pdImage = PDImageXObject.createFromFile(imagePath, pdocument);
    
    CoverCharByImage stripper = new CoverCharByImage(pdImage);
    stripper.setSortByPosition(true);
    Writer dummy = new OutputStreamWriter(new ByteArrayOutputStream());
    stripper.writeText(pdocument, dummy);
    pdocument.save(...);
    

    (CoverCharacterByImage test testCoverLikeLez)

    resulting in

    etc.

    0 讨论(0)
提交回复
热议问题