Extracting tables from a DOCX Word document in python

后端未结

关注

 2  1940

孤独总比滥情好 2021-01-13 07:26

I\'m trying to extract a content of tables in DOCX Word document, and boy I\'m new to xml/xpath.

from docx import *
document = opendocx(\'someFile.docx\')
ta


      
      
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   执笔经年
                                             
                
                
                (楼主)
            
              
              
                2021-01-13 08:01
              

            
            
                        
After some back and forth, we found out that a namespace was needed for this to work correctly. The xpath method is the appropriate solution, it just needs to have the document namespace passed in first.

The lxml xpath method has the details for namespace stuff. Look down the page in the link for passing a namespaces dictionary, and other details.

As explained by mgierdal in his comment above:


  tblList = document.xpath('//w:tbl', namespaces=document.nsmap) works
  like a dream. So, as I understand it w: is a shorthand that has to be
  expanded to the full namespace name, and the dictionary for that is
  provided by document.nsmap.

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它2个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复