Return text string from physical coordinates in a PDF with Python

前端未结

关注

 2  535

生来不讨喜 2021-02-03 15:14

I have been battling with Google and the limited documentation of PDFMiner for the last several hours, and although I feel close, I\'m just not getting what I need. I\'ve worke

2条回答

猫巷女王i (楼主)

2021-02-03 15:50

I was able to find my way around pdfminer thanks to some code by Denis Papathanasiou. The code is discussed in his blog, and you can find the source here: layout_scanner.py

In particular, take a look at the method parse_lt_objs( ). In the final loop, k should be a pair containing the coordinates of that bit of text (and it is discarded). I don't have a working coordinate extractor to post here (I was not interested in them), but it sounds like you'll have no trouble finding your way from there.

Good luck with it!

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...