问题
I am using python-docx to extract particular table data in a word file. I have a word file with multiple tables. This is the particular table in multiple tables and the retrieved data need to be arranged like this.
Challenges:
- Can I find a particular table in word file using python-docx
- Can I achieve my requirement using python-docx
回答1:
This is not a complete answer, but it should point you in the right direction, and is based on some similar task I have been working on.
I run the following code in Python 3.6 in a Jupyter notebook, but it should work just in Python.
First we start but importing the docx Document module and point to the document we want to work with.
from docx.api import Document
document = Document(<your path to doc>)
We create a list of tables, and print how many tables there are in that. We create a list to hold all the tabular data.
tables = document.tables
print (len(tables))
big_data = []
Next we loop through the tables:
for table in document.tables:
data = []
keys = None
for i, row in enumerate(table.rows):
text = (cell.text for cell in row.cells)
if i == 0:
keys = tuple(text)
continue
row_data = dict(zip(keys, text))
data.append(row_data)
#print (data)
big_data.append(data)
print(big_data)
By looping through all the tables, we read the data, creating a list of lists. Each individual list represents a table, and within that we have dictionaries per row. Each dictionary contains a key / value pair. The key is the column heading from the table and value is the cell contents for that row's data for that column.
So, that is half of your problem. The next part would be to use python-docx to create a new table in your output document - and to fill it with the appropriate content from the list / list / dictionary data.
In the example I have been working on this is the final table in the document.
When I run the routine above, this is my output:
[{'Version': '1', 'Changes': 'Local Outcome Improvement Plan ', 'Page Number': '1-34 and 42-61', 'Approved By': 'CPA Board\n', 'Date ': '22 August 2016'},
{'Version': '2', 'Changes': 'People are resilient, included and supported when in need section added ', 'Page Number': '35-41', 'Approved By': 'CPA Board', 'Date ': '12 December 2016'},
{'Version': '2', 'Changes': 'Updated governance and accountability structure following approval of the Final Report for the Review of CPA Infrastructure', 'Page Number': '59', 'Approved By': 'CPA Board', 'Date ': '12 December 2016'}]]
来源:https://stackoverflow.com/questions/49178914/how-to-retrieve-particular-table-data-in-multiple-tables-using-python-docx