Getting Text fonts from a pdf file using iText

后端未结

关注

 1  501

I have been trying to extract the attributes(font, font size, color etc.) of each word in a pdf document using iText library. I could extract the text from every page but not th

相关标签:

1条回答

梦如初夏

2021-01-25 17:47

I'm not a Java person so I can't give you working code but hopefully I can get you 95% of the way there.

First you'll need to create a class that implements the interface com.itextpdf.text.pdf.parser.TextExtractionStrategy

Then you can pass an instance of this class as the third parameter to:

PdfTextExtractor.getTextFromPage(PdfReader reader, int pageNumber, TextExtractionStrategy strategy)

One of the methods of that interface is renderText which gets called for every text block that gets processed. When it gets called a TextRenderInfo gets passed in which has a method called getFont which should give you what you're looking for. Store the contents of that in a buffer of some sort and after getTextFromPage is called you can inspect that buffer to see each font. If you want to see an example of implementing that interface lookup the code for SimpleTextExtractionStrategy online. Otherwise here's a C# version that pretty much does what you're looking for.

0 讨论(0)
发布评论:

提交评论
- 加载中...