Python PDF Parsing with Camelot and Extract the Table Title

后端 未结 1 772
执念已碎
执念已碎 2021-01-21 14:07

Camelot is a fantastic Python library to extract the tables from a pdf file as a data frame. However, I\'m looking for a solution that also returns the table description text wr

相关标签:
1条回答
  • 2021-01-21 14:28

    You can create the Lattice parser directly

                parser = Lattice(**kwargs)
                for p in pages:
                    t = parser.extract_tables(p, suppress_stdout=suppress_stdout,
                                              layout_kwargs=layout_kwargs)
                    tables.extend(t)
    
    

    Then you have access to parser.layout which contains all the components in the page. These components all have bbox (x0, y0, x1, y1) and the extracted tables also have a bbox object. You can find the closest component to the table on top of it and extract the text.

    0 讨论(0)
提交回复
热议问题