pdf-parsing | 易学教程

Ruby: Reading PDF files

阅读更多关于 Ruby: Reading PDF files

I'm looking for a fast and reliable way to read/parse large PDF files in Ruby (on Linux and OSX). Until now I've found the rather old and simple PDF-toolkit (a pdftotext -wrapper) and PDF-reader , which was unable to read most of my files. Though the two libraries provide exactly the functionality I was looking for. My question: Have I missed something? Is there a tool that is better suited (faster and more reliable) to solve my problem? You might find Docsplit useful: Docsplit is a command-line utility and Ruby library for splitting apart documents into their component parts: searchable UTF-8

Extracting table contents from a collection of PDF files [closed]

阅读更多关于 Extracting table contents from a collection of PDF files [closed]

问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 4 years ago . I have a stack of PDFs - potentially hundreds or thousands. They are not all formatted the same, but any of them MAY have one or more tables with interesting information that I would like to collect into a separate database. Of course, I know I have to write something to do this.

Ruby: Reading PDF files

阅读更多关于 Ruby: Reading PDF files

问题 I'm looking for a fast and reliable way to read/parse large PDF files in Ruby (on Linux and OSX). Until now I've found the rather old and simple PDF-toolkit (a pdftotext-wrapper) and PDF-reader, which was unable to read most of my files. Though the two libraries provide exactly the functionality I was looking for. My question: Have I missed something? Is there a tool that is better suited (faster and more reliable) to solve my problem? 回答1: You might find Docsplit useful: Docsplit is a