I am using pdf-clown with pdfclown-0.2.0-HEAD.jar.I have written below code for highlighting search the keyword in Chinese language pdf file and same code is working fine wi
The PDF Clown version you retrieved here from Tymate's maven repository on github has been pushed there April 23rd, 2015. The final (as of now) check-in to the PDF Clown subversion source code repository TRUNK on sourceforge, on the other hand, is from May 27th, 2015. There actually are some 30 checkins after April 23rd, 2015. Thus, you definitely do not use the most current version of this apparently dead PDF library project.
I tested your code with the 0.2.0 development version compiled from that trunk and the result indeed is different:
It is better insofar as the highlights have the width of the sought character and are located nearer to the actual character position. There still is a bug, though, as the second and third match highlights are somewhat off.
The remaining problem actually is not related to the language of the text. It simply is a bug in the processing of one type of the PDF text drawing commands, so it can be observed in documents with text in arbitrary languages. Due to the fact that these commands nowadays are used very seldom only, though, the bug is hardly ever observed, let alone reported. Your PDF, on the other hand, makes use of that kind of text drawing commands.
The bug is in the ShowText
class (package org.pdfclown.documents.contents.objects
). At the end of the scan
method the text line matrix in the graphics state is updated like this if the ShowText
instance actually is a ShowTextToNextLine
instance derived from it:
if(textScanner == null)
{
state.setTm(tm);
if(this instanceof ShowTextToNextLine)
{state.setTlm((AffineTransform)tm.clone());}
}
The text line matrix here is set to the text matrix after the move to the next line and the drawing of the text. This is wrong, it must instead be set to text matrix right after the move to the next line before the drawing of the text.
This can be fixed e.g. like this:
if(textScanner == null)
{
state.setTm(tm);
if(this instanceof ShowTextToNextLine)
state.getTlm().concatenate(new AffineTransform(1, 0, 0, 1, 0, -state.getLead()));
}
With this change in place the result looks like this: