how to retrieve particular table data in multiple tables using python-docx?

随声附和 提交于 2020-12-13 03:29:42

问题


I am using python-docx to extract particular table data in a word file. I have a word file with multiple tables. This is the particular table in multiple tables and the retrieved data need to be arranged like this.

Challenges:

  1. Can I find a particular table in word file using python-docx
  2. Can I achieve my requirement using python-docx

回答1:


This is not a complete answer, but it should point you in the right direction, and is based on some similar task I have been working on.

I run the following code in Python 3.6 in a Jupyter notebook, but it should work just in Python.

First we start but importing the docx Document module and point to the document we want to work with.

from docx.api import Document

document = Document(<your path to doc>)

We create a list of tables, and print how many tables there are in that. We create a list to hold all the tabular data.

tables = document.tables

print (len(tables))

big_data = []

Next we loop through the tables:

for table in document.tables:

    data = []

    keys = None
    for i, row in enumerate(table.rows):
        text = (cell.text for cell in row.cells)

        if i == 0:
            keys = tuple(text)
            continue
        row_data = dict(zip(keys, text))
        data.append(row_data)
        #print (data)
        big_data.append(data)
print(big_data)

By looping through all the tables, we read the data, creating a list of lists. Each individual list represents a table, and within that we have dictionaries per row. Each dictionary contains a key / value pair. The key is the column heading from the table and value is the cell contents for that row's data for that column.

So, that is half of your problem. The next part would be to use python-docx to create a new table in your output document - and to fill it with the appropriate content from the list / list / dictionary data.

In the example I have been working on this is the final table in the document.

When I run the routine above, this is my output:

[{'Version': '1', 'Changes': 'Local Outcome Improvement Plan ', 'Page Number': '1-34 and 42-61', 'Approved By': 'CPA Board\n', 'Date ': '22 August 2016'}, 
{'Version': '2', 'Changes': 'People are resilient, included and supported when in need section added ', 'Page Number': '35-41', 'Approved By': 'CPA Board', 'Date ': '12 December 2016'}, 
{'Version': '2', 'Changes': 'Updated governance and accountability structure following approval of the Final Report for the Review of CPA Infrastructure', 'Page Number': '59', 'Approved By': 'CPA Board', 'Date ': '12 December 2016'}]]


来源:https://stackoverflow.com/questions/49178914/how-to-retrieve-particular-table-data-in-multiple-tables-using-python-docx

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!