How to get Unicode of the characters from PDF using java and PDFBox

后端 未结 2 673
死守一世寂寞
死守一世寂寞 2021-01-19 14:26

I am using Apache PDFBox and Java to parse the PDFs and get all the information from it. Extracting text is working fine for English only. For other languages I get only som

相关标签:
2条回答
  • 2021-01-19 14:40

    http://grepcode.com/file/repo1.maven.org/maven2/org.apache.pdfbox/pdfbox/1.6.0/org/apache/pdfbox/util/PDFText2HTML.java

    The private String escape(String chars) converts characters to unicode.

    0 讨论(0)
  • 2021-01-19 14:54

    Try changing the Java system locale. From your Java program, this should be equivalent to changing the OS setting.

    0 讨论(0)
提交回复
热议问题