Python xpath not working?

前端 未结 2 1990
北恋
北恋 2021-01-26 08:10

Okay, this is starting to drive me a little bit nuts. I\'ve tried several xml/xpath libraries for Python, and can\'t figure out a simple way to get a stinkin\' \"title\" element

相关标签:
2条回答
  • 2021-01-26 08:59

    It is indeed the namespaces. It was a bit tricky to find in the lxml docs, but here's how you do it:

    from lxml import etree
    doc = etree.parse(open('index.html'))
    doc.xpath('//default:title', namespaces={'default':'http://www.w3.org/2005/Atom'})
    

    You can also do this:

    title_finder = etree.ETXPath('//{http://www.w3.org/2005/Atom}title')
    title_finder(doc)
    

    And you'll get the titles back in both cases.

    0 讨论(0)
  • 2021-01-26 09:04

    You probably just have to take into account the namespace of the document which you're dealing with.

    I'd suggest looking up how to deal with namespaces in Amara:

    http://www.xml3k.org/Amara/Manual#namespaces

    Edit: Using your code snippet I made some edits. I don't know what version of Amara you're using but based on the docs I tried to accommodate it as much as possible:

    def view(req, url):
        req.content_type = 'text/plain'
        ns = {u'f' : u'http://www.w3.org/2005/Atom',
            u't' : u'http://purl.org/syndication/thread/1.0'}
        doc = amara.parse(urlopen(url), prefixes=ns)
        req.write(str(doc.xml_xpath(u'f:title')))
    
    0 讨论(0)
提交回复
热议问题