I\'m using PDFBox to extract text from a document by extending PDFTextStripper. I\'ve noticed that some of these documents contain invisible characters that are being extrac