pdftotext

IText reading PDF like pdftotext -layout?

不羁的心 提交于 2019-11-27 05:41:23
Im looking for the easiest way to implement a java solution which is quiet similar to the output of pdftotext -layout FILE on linux machines. (And of course it should be cheap as well) I just tried some code snippets of IText, PDFBox and PDFTextStream. The most accurate solution so far is PDFTextStream which uses the VisualOutputTarget to get a great representation of my file. So my column layout is recognized correct and I'm able to work with it. But there should be also a solution for IText, or? Every easy snippet I found produces plain ordered strings which are a mess (mess up row/column

itext java pdf to text creation

与世无争的帅哥 提交于 2019-11-26 14:57:51
问题 I use a itext for converting pdf to text file, it works good actually but for some words it do the following thing: for example in pdf there is phrase like "present the main ideas" but itext creates an output like "presentthemainideas". Is there anyway to correct this behaviour? String pdf="/home/can/Downloads/NLP/textSummarization/A New Approach for Multi-Document Update Summarization.pdf"; String txt="/home/can/myWorkSpace/PDFConverterProject/outputs/bb.txt"; StringBuffer text=new

IText reading PDF like pdftotext -layout?

a 夏天 提交于 2019-11-26 12:48:47
问题 Im looking for the easiest way to implement a java solution which is quiet similar to the output of pdftotext -layout FILE on linux machines. (And of course it should be cheap as well) I just tried some code snippets of IText, PDFBox and PDFTextStream. The most accurate solution so far is PDFTextStream which uses the VisualOutputTarget to get a great representation of my file. So my column layout is recognized correct and I\'m able to work with it. But there should be also a solution for