I have been trying to extract the attributes(font, font size, color etc.) of each word in a pdf document using iText library. I could extract the text from every page but not th
I'm not a Java person so I can't give you working code but hopefully I can get you 95% of the way there.
First you'll need to create a class that implements the interface com.itextpdf.text.pdf.parser.TextExtractionStrategy
Then you can pass an instance of this class as the third parameter to:
PdfTextExtractor.getTextFromPage(PdfReader reader, int pageNumber, TextExtractionStrategy strategy)
One of the methods of that interface is renderText which gets called for every text block that gets processed. When it gets called a TextRenderInfo gets passed in which has a method called getFont which should give you what you're looking for. Store the contents of that in a buffer of some sort and after getTextFromPage
is called you can inspect that buffer to see each font. If you want to see an example of implementing that interface lookup the code for SimpleTextExtractionStrategy online. Otherwise here's a C# version that pretty much does what you're looking for.