Parsing XML with namespaces using ElementTree in Python

后端未结

关注

 2  1007

I have an xml, small part of it looks like this:


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  隐瞒了意图╮        
                
              
                            
                2020-12-18 16:27
              
            
            
                                                                       
From what I gather, it has something to do with the namespace recognition in ET.

from here http://effbot.org/zone/element-namespaces.htm


  When you save an Element tree to XML, the standard Element serializer generates unique prefixes for all URI:s that appear in the tree. The prefixes usually have the form “ns” followed by a number. For example, the above elements might be serialized with the prefix ns0 for “http://www.w3.org/1999/xhtml” and ns1 for “http://effbot.org/namespace/letters”.


If you want to use specific prefixes, you can add prefix/uri mappings to a global table in the ElementTree module. In 1.3 and later, you do this by calling the register_namespace function. In earlier versions, you can access the internal table directly:

ElementTree 1.3

ET.register_namespace(prefix, uri)

ElementTree 1.2 (Python 2.5)

ET._namespace_map[uri] = prefix

Note the argument order; the function takes the prefix first, while the raw dictionary maps from URI:s to prefixes.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  暗喜        
                
              
                            
                2020-12-18 16:44
              
            
            
                                                                       
This snippet from your question,

for a in root.findall('{urn:com:xml:data}image'):
    print a.attrib


does not output anything because it only looks for direct {urn:com:xml:data}image children of the root of the tree.

This slightly modified code,

for a in root.findall('.//{urn:com:xml:data}image'):
    print a.attrib


will print {'imageId': '1'} because it uses .//, which selects matching subelements on all levels.

Reference: https://docs.python.org/2/library/xml.etree.elementtree.html#supported-xpath-syntax.



It is a bit annoying that ElementTree does not just retain the original namespace prefixes by default, but keep in mind that it is not the prefixes that matter anyway. The register_namespace() function can be used to set the wanted prefix when serializing the XML. The function does not have any effect on parsing or searching.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复