I\'m trying to extract a content of tables in DOCX Word document, and boy I\'m new to xml/xpath.
from docx import *
document = opendocx(\'someFile.docx\')
ta
After some back and forth, we found out that a namespace was needed for this to work correctly. The xpath method is the appropriate solution, it just needs to have the document namespace passed in first.
The lxml xpath method has the details for namespace stuff. Look down the page in the link for passing a namespaces dictionary, and other details.
As explained by mgierdal in his comment above:
tblList = document.xpath('//w:tbl', namespaces=document.nsmap) works like a dream. So, as I understand it w: is a shorthand that has to be expanded to the full namespace name, and the dictionary for that is provided by document.nsmap.