pdf clown- not highlighting specific search keyword

后端 未结 1 1029
陌清茗
陌清茗 2021-01-15 22:57

I am using pdf-clown with pdfclown-0.2.0-HEAD.jar.I have written below code for highlighting search the keyword in Chinese language pdf file and same code is working fine wi

相关标签:
1条回答
  • 2021-01-15 23:54

    Your PDF Clown version

    The PDF Clown version you retrieved here from Tymate's maven repository on github has been pushed there April 23rd, 2015. The final (as of now) check-in to the PDF Clown subversion source code repository TRUNK on sourceforge, on the other hand, is from May 27th, 2015. There actually are some 30 checkins after April 23rd, 2015. Thus, you definitely do not use the most current version of this apparently dead PDF library project.

    Using the current 0.2.0 snapshot

    I tested your code with the 0.2.0 development version compiled from that trunk and the result indeed is different:

    screenshot still somewhat buggy

    It is better insofar as the highlights have the width of the sought character and are located nearer to the actual character position. There still is a bug, though, as the second and third match highlights are somewhat off.

    Fixing the bug

    The remaining problem actually is not related to the language of the text. It simply is a bug in the processing of one type of the PDF text drawing commands, so it can be observed in documents with text in arbitrary languages. Due to the fact that these commands nowadays are used very seldom only, though, the bug is hardly ever observed, let alone reported. Your PDF, on the other hand, makes use of that kind of text drawing commands.

    The bug is in the ShowText class (package org.pdfclown.documents.contents.objects). At the end of the scan method the text line matrix in the graphics state is updated like this if the ShowText instance actually is a ShowTextToNextLine instance derived from it:

    if(textScanner == null)
    {
      state.setTm(tm);
    
      if(this instanceof ShowTextToNextLine)
      {state.setTlm((AffineTransform)tm.clone());}
    }
    

    The text line matrix here is set to the text matrix after the move to the next line and the drawing of the text. This is wrong, it must instead be set to text matrix right after the move to the next line before the drawing of the text.

    This can be fixed e.g. like this:

    if(textScanner == null)
    {
      state.setTm(tm);
    
      if(this instanceof ShowTextToNextLine)
        state.getTlm().concatenate(new AffineTransform(1, 0, 0, 1, 0, -state.getLead()));
    }
    

    With this change in place the result looks like this:

    0 讨论(0)
提交回复
热议问题