Return text string from physical coordinates in a PDF with Python

前端 未结 2 535
生来不讨喜
生来不讨喜 2021-02-03 15:14

I have been battling with Google and the limited documentation of PDFMiner for the last several hours, and although I feel close, I\'m just not getting what I need. I\'ve worke

2条回答
  •  猫巷女王i
    2021-02-03 15:50

    I was able to find my way around pdfminer thanks to some code by Denis Papathanasiou. The code is discussed in his blog, and you can find the source here: layout_scanner.py

    In particular, take a look at the method parse_lt_objs( ). In the final loop, k should be a pair containing the coordinates of that bit of text (and it is discarded). I don't have a working coordinate extractor to post here (I was not interested in them), but it sounds like you'll have no trouble finding your way from there.

    Good luck with it!

提交回复
热议问题