Placing an image over text, by using the text postiton in a PDF using PDFBox.

后端未结

关注

 1  1612

Result is that image is not placed correctly over text. Am i getting the text positions wrong?

This is an example on how to get the x/y coordinates and size of each char

Retrieving sensible coordinates

You use text.getXDirAdj() and text.getYDirAdj() as x and y coordinates in the content stream. This is won't work because the coordinates PDFBox uses during text extraction are transformed into a coordinate system they prefer for text extraction purposes, cf. the JavaDocs:

/**
 * This will get the text direction adjusted x position of the character.
 * This is adjusted based on text direction so that the first character
 * in that direction is in the upper left at 0,0.
 *
 * @return The x coordinate of the text.
 */
public float getXDirAdj()

/**
 * This will get the y position of the text, adjusted so that 0,0 is upper left and it is
 * adjusted based on the text direction.
 *
 * @return The adjusted y coordinate of the character.
 */
public float getYDirAdj()

For a TextPosition text you should instead use

text.getTextMatrix().getTranslatex()

and

text.getTextMatrix().getTranslateY()

But even these numbers may have to be corrected, cf. this answer, because PDFBox has multiplied the matrix by a translation making the lower left corner of the crop box the origin.

Thus, if PDRectangle cropBox is the crop box of the current page, use

text.getTextMatrix().getTranslatex() + cropBox.getLowerLeftX()

and

text.getTextMatrix().getTranslateY() + cropBox.getLowerLeftY()

(This coordinate normalization of PDFBox is a PITA for anyone who actually wants to work with the text coordinates...)

Other issues

Your code has some other issues, one of them becoming clear with the document you shared: You append to the page content stream without resetting the graphics context:

PDPageContentStream contentStream = new PDPageContentStream(pdocument,
        stripper.getCurrentPage(), true, true);

The constructor with this signature assumes you don't want to reset the context. Use the one with an additional boolean parameter and set that to true to request context resets:

PDPageContentStream contentStream = new PDPageContentStream(pdocument,
        stripper.getCurrentPage(), true, true, true);

Now the context is reset and the position is ok again.

Both these constructors are deprecated, though, and shouldn't be used for that reason. In the development branch they have been removed already. Instead use

PDPageContentStream contentStream = new PDPageContentStream(pdocument,
        stripper.getCurrentPage(), AppendMode.APPEND, true, true);

This introduces another issue, though: You create a new PDPageContentStream for each writeString call. If that is done with context reset each time, the nesting of saveGraphicsState/restoreGraphicsState pairs may become pretty deep. Thus, you should only create one such content stream per page and use it in all writeString calls for that page.

Thus, your text stripper sub-class might look like this:

class CoverCharByImage extends PDFTextStripper {
    public CoverCharByImage(PDImageXObject pdImage) throws IOException {
        super();
        this.pdImage = pdImage;
    }

    final PDImageXObject pdImage;
    PDPageContentStream contentStream = null;

    @Override
    public void processPage(PDPage page) throws IOException {
        super.processPage(page);
        if (contentStream != null) {
            contentStream.close();
            contentStream = null;
        }
    }

    @Override
    protected void writeString(String string, List<TextPosition> textPositions) throws IOException {
        if (contentStream == null)
            contentStream = new PDPageContentStream(document, getCurrentPage(), AppendMode.APPEND, true, true);

        PDRectangle cropBox = getCurrentPage().getCropBox();

        for (TextPosition text : textPositions) {
            if (text.getUnicode().equals("a")) {
                contentStream.drawImage(pdImage, text.getTextMatrix().getTranslateX() + cropBox.getLowerLeftX(),
                        text.getTextMatrix().getTranslateY() + cropBox.getLowerLeftY(),
                        text.getWidthDirAdj(), text.getHeightDir());
            }
        }
    }
}

(CoverCharacterByImage inner class)

and it may be used like this:

PDDocument pdocument = PDDocument.load(...);

String imagePath = ...;
PDImageXObject pdImage = PDImageXObject.createFromFile(imagePath, pdocument);

CoverCharByImage stripper = new CoverCharByImage(pdImage);
stripper.setSortByPosition(true);
Writer dummy = new OutputStreamWriter(new ByteArrayOutputStream());
stripper.writeText(pdocument, dummy);
pdocument.save(...);

(CoverCharacterByImage test testCoverLikeLez)

resulting in

etc.

0 讨论(0)