pdf-parsing

Ruby: Reading PDF files

穿精又带淫゛_ 提交于 2019-11-27 10:26:23
I'm looking for a fast and reliable way to read/parse large PDF files in Ruby (on Linux and OSX). Until now I've found the rather old and simple PDF-toolkit (a pdftotext -wrapper) and PDF-reader , which was unable to read most of my files. Though the two libraries provide exactly the functionality I was looking for. My question: Have I missed something? Is there a tool that is better suited (faster and more reliable) to solve my problem? You might find Docsplit useful: Docsplit is a command-line utility and Ruby library for splitting apart documents into their component parts: searchable UTF-8

Extracting table contents from a collection of PDF files [closed]

ⅰ亾dé卋堺 提交于 2019-11-26 23:58:42
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 4 years ago . I have a stack of PDFs - potentially hundreds or thousands. They are not all formatted the same, but any of them MAY have one or more tables with interesting information that I would like to collect into a separate database. Of course, I know I have to write something to do this.

Ruby: Reading PDF files

白昼怎懂夜的黑 提交于 2019-11-26 17:56:30
问题 I'm looking for a fast and reliable way to read/parse large PDF files in Ruby (on Linux and OSX). Until now I've found the rather old and simple PDF-toolkit (a pdftotext-wrapper) and PDF-reader, which was unable to read most of my files. Though the two libraries provide exactly the functionality I was looking for. My question: Have I missed something? Is there a tool that is better suited (faster and more reliable) to solve my problem? 回答1: You might find Docsplit useful: Docsplit is a