How to parse an xml feed using python?

前端 未结 2 927
[愿得一人]
[愿得一人] 2021-01-01 03:11

I am trying to parse this xml (http://www.reddit.com/r/videos/top/.rss) and am having troubles doing so. I am trying to save the youtube links in each of the items, but am

相关标签:
2条回答
  • 2021-01-01 03:37

    I wrote that for you using Xpath expressions (tested successfully ):

    from lxml import etree
    import urllib2
    
    headers = { 'User-Agent' : 'Mozilla/5.0' }
    req = urllib2.Request('http://www.reddit.com/r/videos/top/.rss', None, headers)
    reddit_file = urllib2.urlopen(req).read()
    
    reddit = etree.fromstring(reddit_file)
    
    for item in reddit.xpath('/rss/channel/item'):
        print "title =", item.xpath("./title/text()")[0]
        print "description =", item.xpath("./description/text()")[0]
        print "thumbnail =", item.xpath("./*[local-name()='thumbnail']/@url")[0]
        print "link =", item.xpath("./link/text()")[0]
        print "-" * 100
    
    0 讨论(0)
  • 2021-01-01 03:50

    You can try findall('channel/item')

    import urllib2
    from xml.etree import ElementTree as etree
    #reddit parse
    reddit_file = urllib2.urlopen('http://www.reddit.com/r/videos/top/.rss')
    #convert to string:
    reddit_data = reddit_file.read()
    print reddit_data
    #close file because we dont need it anymore:
    reddit_file.close()
    
    #entire feed
    reddit_root = etree.fromstring(reddit_data)
    item = reddit_root.findall('channel/item')
    print item
    
    reddit_feed=[]
    for entry in item:   
        #get description, url, and thumbnail
        desc = entry.findtext('description')  
        reddit_feed.append([desc])
    
    0 讨论(0)
提交回复
热议问题