using minidom to parse xml

守給你的承諾、 提交于 2019-12-13 02:43:12

问题


Hi I have trouble understanding the minidom module for Python.

I have xml that looks like this:

<Show>
<name>Dexter</name>
<totalseasons>7</totalseasons>
<Episodelist>
<Season no="1">
<episode>
<epnum>1</epnum>
<seasonnum>01</seasonnum>
<prodnum>101</prodnum>
<airdate>2006-10-01</airdate>
<link>http://www.tvrage.com/Dexter/episodes/408409</link>
<title>Dexter</title>
</episode>
<episode>
<epnum>2</epnum>
<seasonnum>02</seasonnum>
<prodnum>102</prodnum>
<airdate>2006-10-08</airdate>
<link>http://www.tvrage.com/Dexter/episodes/408410</link>
<title>Crocodile</title>
</episode>
<episode>
<epnum>3</epnum>
<seasonnum>03</seasonnum>
<prodnum>103</prodnum>
<airdate>2006-10-15</airdate>
<link>http://www.tvrage.com/Dexter/episodes/408411</link>
<title>Popping Cherry</title>
</episode>

More pretty: http://services.tvrage.com/feeds/episode_list.php?sid=7926

And this is my python code trying to read from that:

xml = minidom.parse(urlopen("http://services.tvrage.com/feeds/episode_list.php?sid=7926"))
for episode in xml.getElementsByTagName('episode'):
    for node in episode.attributes['title']:
        print node.data

I can't get the actual episode data out as I want to get all the data from each episode. I've tried different variants but I can't get it to work. Mostly I get a <DOM Element: asdasd> back. I only care about the data inside each episode.

Thanks for the help


回答1:


Each episode element has child-elements, including a title element. Your code, however, is looking for attributes instead.

To get text out of a minidom element, you need a helper function:

def getText(nodelist):
    rc = []
    for node in nodelist:
        if node.nodeType == node.TEXT_NODE:
            rc.append(node.data)
    return ''.join(rc)

And then you can more easily print all the titles:

for episode in xml.getElementsByTagName('episode'):
    for title in episode.getElementsByTagName('title'):
        print getText(title)



回答2:


title is not an attribute, its a tag. An attribute is like src in <img src="foo.jpg" />

>>> parsed = parseString(s)
>>> titles = [n.firstChild.data for n in parsed.getElementsByTagName('title')]
>>> titles
[u'Dexter', u'Crocodile', u'Popping Cherry']

You can extend the above to fetch other details. lxml is better suited for this though. As you can see from the snippet above minidom is not that friendly.




回答3:


Thanks to Martijn Pieters who tipped me with the ElementTree API I solved this problem.

xml = ET.parse(urlopen("http://services.tvrage.com/feeds/episode_list.php?sid=7296"))
                print 'xml fetched..'
                for episode in xml.iter('episode'):
                    print episode.find('title').text

Thanks



来源:https://stackoverflow.com/questions/12338877/using-minidom-to-parse-xml

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!