parsing xml containing default namespace to get an element value using lxml

后端 未结 1 768
耶瑟儿~
耶瑟儿~ 2020-11-29 12:58

I have a xml string like this

str1 = \"\"\"

    
                


        
相关标签:
1条回答
  • 2020-11-29 13:28

    This is a common error when dealing with XML having default namespace. Your XML has default namespace, a namespace declared without prefix, here :

    <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    

    Note that not only element where default namespace declared is in that namespace, but all descendant elements inherit ancestor default namespace implicitly, unless otherwise specified (using explicit namespace prefix or local default namespace that point to different namespace uri). That means, in this case, all elements including loc are in default namespace.

    To select element in namespace, you'll need to define prefix to namespace mapping and use the prefix properly in the XPath :

    from lxml import etree
    str1 = '''<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <sitemap>
        <loc>
            http://www.example.org/sitemap_1.xml.gz
        </loc>
        <lastmod>2015-07-01</lastmod>
    </sitemap>
    </sitemapindex>'''
    root = etree.fromstring(str1)
    
    ns = {"d" : "http://www.sitemaps.org/schemas/sitemap/0.9"}
    url = root.xpath("//d:loc", namespaces=ns)[0]
    print etree.tostring(url)
    

    output :

    <loc xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
            http://www.example.org/sitemap_1.xml.gz
        </loc>
    
    0 讨论(0)
提交回复
热议问题