Using XPath in ElementTree

后端 未结 5 1426
情话喂你
情话喂你 2020-11-30 00:51

My XML file looks like the following:




        
相关标签:
5条回答
  • 2020-11-30 00:56

    One of the most straight forward approach and works even with python 3.0 and other versions is like below:

    It just takes the root and starts getting into it till we get the specified "Amount" tag

     from xml.etree import ElementTree as ET
     tree = ET.parse('output.xml')
     root = tree.getroot()
     #print(root)
     e = root.find(".//{http://webservices.amazon.com/AWSECommerceService/2008-08-19}Amount")
     print(e.text)
    
    0 讨论(0)
  • 2020-11-30 00:58

    I ended up stripping out the xmlns from the raw xml like that:

    def strip_ns(xml_string):
        return re.sub('xmlns="[^"]+"', '', xml_string)
    

    Obviously be very careful with this, but it worked well for me.

    0 讨论(0)
  • 2020-11-30 00:59
    from xml.etree import ElementTree as ET
    tree = ET.parse("output.xml")
    namespace = tree.getroot().tag[1:].split("}")[0]
    amount = tree.find(".//{%s}Amount" % namespace).text
    

    Also, consider using lxml. It's way faster.

    from lxml import ElementTree as ET
    
    0 讨论(0)
  • 2020-11-30 01:06

    There are 2 problems that you have.

    1) element contains only the root element, not recursively the whole document. It is of type Element not ElementTree.

    2) Your search string needs to use namespaces if you keep the namespace in the XML.

    To fix problem #1:

    You need to change:

    element = ET.parse(fp).getroot()
    

    to:

    element = ET.parse(fp)
    

    To fix problem #2:

    You can take off the xmlns from the XML document so it looks like this:

    <?xml version="1.0"?>
    <ItemSearchResponse>
      <Items>
        <Item>
          <ItemAttributes>
            <ListPrice>
              <Amount>2260</Amount>
            </ListPrice>
          </ItemAttributes>
          <Offers>
            <Offer>
              <OfferListing>
                <Price>
                  <Amount>1853</Amount>
                </Price>
              </OfferListing>
            </Offer>
          </Offers>
        </Item>
      </Items>
    </ItemSearchResponse>
    

    With this document you can use the following search string:

    e = element.findall('Items/Item/ItemAttributes/ListPrice/Amount')
    

    The full code:

    from elementtree import ElementTree as ET
    fp = open("output.xml","r")
    element = ET.parse(fp)
    e = element.findall('Items/Item/ItemAttributes/ListPrice/Amount')
    for i in e:
      print i.text
    

    Alternate fix to problem #2:

    Otherwise you need to specify the xmlns inside the srearch string for each element.

    The full code:

    from elementtree import ElementTree as ET
    fp = open("output.xml","r")
    element = ET.parse(fp)
    
    namespace = "{http://webservices.amazon.com/AWSECommerceService/2008-08-19}"
    e = element.findall('{0}Items/{0}Item/{0}ItemAttributes/{0}ListPrice/{0}Amount'.format(namespace))
    for i in e:
        print i.text
    

    Both print:

    2260

    0 讨论(0)
  • 2020-11-30 01:07

    Element tree uses namespaces so all the elements in your xml have name like {http://webservices.amazon.com/AWSECommerceService/2008-08-19}Items

    So make the search include the namespace e.g.

    search = '{http://webservices.amazon.com/AWSECommerceService/2008-08-19}Items/{http://webservices.amazon.com/AWSECommerceService/2008-08-19}Item/{http://webservices.amazon.com/AWSECommerceService/2008-08-19}ItemAttributes/{http://webservices.amazon.com/AWSECommerceService/2008-08-19}ListPrice/{http://webservices.amazon.com/AWSECommerceService/2008-08-19}Amount'
    element.findall( search )
    

    gives the element corresponding to 2260

    0 讨论(0)
提交回复
热议问题