Result is that image is not placed correctly over text. Am i getting the text positions wrong?
This is an example on how to get the x/y coordinates and size of each char
You use text.getXDirAdj()
and text.getYDirAdj()
as x and y coordinates in the content stream. This is won't work because the coordinates PDFBox uses during text extraction are transformed into a coordinate system they prefer for text extraction purposes, cf. the JavaDocs:
/**
* This will get the text direction adjusted x position of the character.
* This is adjusted based on text direction so that the first character
* in that direction is in the upper left at 0,0.
*
* @return The x coordinate of the text.
*/
public float getXDirAdj()
/**
* This will get the y position of the text, adjusted so that 0,0 is upper left and it is
* adjusted based on the text direction.
*
* @return The adjusted y coordinate of the character.
*/
public float getYDirAdj()
For a TextPosition text
you should instead use
text.getTextMatrix().getTranslatex()
and
text.getTextMatrix().getTranslateY()
But even these numbers may have to be corrected, cf. this answer, because PDFBox has multiplied the matrix by a translation making the lower left corner of the crop box the origin.
Thus, if PDRectangle cropBox
is the crop box of the current page, use
text.getTextMatrix().getTranslatex() + cropBox.getLowerLeftX()
and
text.getTextMatrix().getTranslateY() + cropBox.getLowerLeftY()
(This coordinate normalization of PDFBox is a PITA for anyone who actually wants to work with the text coordinates...)
Your code has some other issues, one of them becoming clear with the document you shared: You append to the page content stream without resetting the graphics context:
PDPageContentStream contentStream = new PDPageContentStream(pdocument,
stripper.getCurrentPage(), true, true);
The constructor with this signature assumes you don't want to reset the context. Use the one with an additional boolean
parameter and set that to true
to request context resets:
PDPageContentStream contentStream = new PDPageContentStream(pdocument,
stripper.getCurrentPage(), true, true, true);
Now the context is reset and the position is ok again.
Both these constructors are deprecated, though, and shouldn't be used for that reason. In the development branch they have been removed already. Instead use
PDPageContentStream contentStream = new PDPageContentStream(pdocument,
stripper.getCurrentPage(), AppendMode.APPEND, true, true);
This introduces another issue, though: You create a new PDPageContentStream
for each writeString
call. If that is done with context reset each time, the nesting of saveGraphicsState/restoreGraphicsState pairs may become pretty deep. Thus, you should only create one such content stream per page and use it in all writeString
calls for that page.
Thus, your text stripper sub-class might look like this:
class CoverCharByImage extends PDFTextStripper {
public CoverCharByImage(PDImageXObject pdImage) throws IOException {
super();
this.pdImage = pdImage;
}
final PDImageXObject pdImage;
PDPageContentStream contentStream = null;
@Override
public void processPage(PDPage page) throws IOException {
super.processPage(page);
if (contentStream != null) {
contentStream.close();
contentStream = null;
}
}
@Override
protected void writeString(String string, List<TextPosition> textPositions) throws IOException {
if (contentStream == null)
contentStream = new PDPageContentStream(document, getCurrentPage(), AppendMode.APPEND, true, true);
PDRectangle cropBox = getCurrentPage().getCropBox();
for (TextPosition text : textPositions) {
if (text.getUnicode().equals("a")) {
contentStream.drawImage(pdImage, text.getTextMatrix().getTranslateX() + cropBox.getLowerLeftX(),
text.getTextMatrix().getTranslateY() + cropBox.getLowerLeftY(),
text.getWidthDirAdj(), text.getHeightDir());
}
}
}
}
(CoverCharacterByImage inner class)
and it may be used like this:
PDDocument pdocument = PDDocument.load(...);
String imagePath = ...;
PDImageXObject pdImage = PDImageXObject.createFromFile(imagePath, pdocument);
CoverCharByImage stripper = new CoverCharByImage(pdImage);
stripper.setSortByPosition(true);
Writer dummy = new OutputStreamWriter(new ByteArrayOutputStream());
stripper.writeText(pdocument, dummy);
pdocument.save(...);
(CoverCharacterByImage test testCoverLikeLez
)
resulting in
etc.