What's the best way to import/read data from pdf files?

前端 未结 4 1235
暗喜
暗喜 2021-01-01 04:59

We get a large amount of data from our clients in pdf files in varying formats [layout-wise], these files are typically report output, and are typically properly annotated [

相关标签:
4条回答
  • 2021-01-01 05:38

    We use Xpdf in one of our applications. Its a c++ library which is primarily used for pdf rendering, although it does have a text extractor which could be useful for this project.

    0 讨论(0)
  • 2021-01-01 05:44

    pdftohtml -xml

    although pdftoipe seems more detailed!!

    0 讨论(0)
  • 2021-01-01 05:51

    If you're fine with calling something external, you can use ghostscript - look at the ps2ascii script included with the distribution. I'm not sure what you want from a graphical tool - a big button that you push to chose the input and output files? A preview? You might be able to use GSView, depending on what you want.

    0 讨论(0)
  • 2021-01-01 05:58

    Have you looked at Aspose? We're using it for an ASP.net app and I've seen some examples of vbscript using it as well. It's not particularly expensive either.

    http://www.aspose.com/

    0 讨论(0)
提交回复
热议问题