Python Camelot borderless table extraction issue

送分小仙女□ 提交于 2019-12-23 18:34:21

问题


I'm trying hard to extract some borderless table as show in the below image which are from pdf files. I have installed python-camelot as shown here and is working fine for bordered tables only. Please find below details:

platform - Linux-4.5.5-300.fc24.x86_64-x86_64-with-fedora-24-Twenty_Four

sys - Python 3.6.1 (default, May 15 2017, 11:42:04)[GCC 6.3.1 20161221 (Red Hat 6.3.1-1)]

numpy - NumPy 1.15.4

cv2 - OpenCV 3.4.3

camelot - Camelot 0.3.2


回答1:


To improve the detected area, you can increase the edge_tol (default: 50) value to counter the effect of text being placed relatively far apart vertically. Larger edge_tol will lead to longer textedges being detected, leading to an improved guess of the table area. Let’s use a value of 500.

>>> tables = camelot.read_pdf('edge_tol.pdf', flavor='stream', edge_tol=500)
>>> camelot.plot(tables[0], kind='contour')
>>> plt.show()
>>> tables[0].df



回答2:


Camelot uses lattice by default which relies on clear lines dividing the cells.

For tables without lines you want to use stream:

tables = camelot.read_pdf('your_file_name.pdf', flavor = 'stream')


来源:https://stackoverflow.com/questions/53209335/python-camelot-borderless-table-extraction-issue

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!