ITextSharp Find coordinates of specific text in PDF

前端 未结 3 920
南笙
南笙 2020-12-22 03:28

I have found many sites and postings that the question is the same as mine but what they all seem to have in common is people are answering them with examples of how to inse

相关标签:
3条回答
  • 2020-12-22 03:46

    First, in case just words are english , you can find parse easily, but when your documents is not english language, you should understand the font of your language exactally UNICODE.

    0 讨论(0)
  • 2020-12-22 03:49

    You can make use of the parser package of iText (Sharp) to find the position of a given text. You do have to implement your own RenderListener, though, as the main use case of that package is text extraction, not text position finding.

    It is not as easy as you might think as e.g. the individual characters of the words might come in separately in any order.

    PS:

    First you will have to find out, though, whether the line for the signature consists of characters (as your question seems to imply) or whether it is a drawn path. Additionally you will have to find out whether that line is unique in the document.

    In the former case, the RenderListener implementation you need has to inspect the TextRenderInfo objects forwarded for processing in its RenderText method. If its text content contains those unique characters building the signatrue line, you have to store the position data of this TextRenderInfo. If the line characters are not unique, you will have to find some additional criteria making them unique, e.g. some preceding string or possibly a fact that its the last occurance of those characters in the document.

    In the latter case the parser package functionality has to be somewhat extended as it currently does not report paths. According to the iText mailing list, an extension like that is on the ToDo list.

    0 讨论(0)
  • 2020-12-22 03:54

    This question isn't directly related to what you want to accomplish, but it is indirectly related

    JCIS posted a great application that shows you the very arduous task of locating specific text, albeit with VB. It wouldn't be as simple as plugging it into a vb > c# converter, but it should be translatable. This may seem like an easy task to accomplish you might think, but PDF is not a document format, it's a display format technically. The difference between those 2 is what makes this such a long process.

    0 讨论(0)
提交回复
热议问题