How to get Unicode of the characters from PDF using java and PDFBox

后端 未结 2 678
死守一世寂寞
死守一世寂寞 2021-01-19 14:26

I am using Apache PDFBox and Java to parse the PDFs and get all the information from it. Extracting text is working fine for English only. For other languages I get only som

2条回答
  •  情话喂你
    2021-01-19 14:40

    http://grepcode.com/file/repo1.maven.org/maven2/org.apache.pdfbox/pdfbox/1.6.0/org/apache/pdfbox/util/PDFText2HTML.java

    The private String escape(String chars) converts characters to unicode.

提交回复
热议问题