问题
Working in python, my goal is to parse through an XML doc I made and create a nested list of lists in order to access them later and parse the feeds. The XML doc resembles the following snippet:
<?xml version="1.0'>
<sources>
<!--Source List by Institution-->
<sourceList source="cbc">
<f>http://rss.cbc.ca/lineup/topstories.xml</f>
</sourceList>
<sourceList source="bbc">
<f>http://feeds.bbci.co.uk/news/rss.xml</f>
<f>http://feeds.bbci.co.uk/news/world/rss.xml</f>
<f>http://feeds.bbci.co.uk/news/uk/rss.xml</f>
</sourceList>
<sourceList source="reuters">
<f>http://feeds.reuters.com/reuters/topNews</f>
<f>http://feeds.reuters.com/news/artsculture</f>
</sourceList>
</sources>
I would like to have something like nested lists where the inner most list would be the content between the <f></f>
tags and the list above that one would be created with the names of the sources ex. source="reuters"
would be reuters. Retrieving the info from the XML doc isn't a problem and I'm doing it with elementtree
with loops retrieving with node.get('source')
etc. The problem is I'm having trouble generating the lists with the desired names and different lengths required from the different sources. I have tried appending but am unsure how to append to list with the names retrieved. Would a dictionary be better? What would be the best practice in this situation? And how might I make this work? If any more info is required just post a comment and I'll be sure to add it.
回答1:
From your description, a dictionary with keys according to the source name and values according to the feed lists might do the trick.
Here is one way to construct such a beast:
from lxml import etree
from pprint import pprint
news_sources = {
source.attrib['source'] : [feed.text for feed in source.xpath('./f')]
for source in etree.parse('x.xml').xpath('/sources/sourceList')}
pprint(news_sources)
Another sample, without lxml
or xpath
:
import xml.etree.ElementTree as ET
from pprint import pprint
news_sources = {
source.attrib['source'] : [feed.text for feed in source]
for source in ET.parse('x.xml').getroot()}
pprint(news_sources)
Finally, if you are allergic to list comprehensions:
import xml.etree.ElementTree as ET
from pprint import pprint
xml = ET.parse('x.xml')
root = xml.getroot()
news_sources = {}
for sourceList in root:
sourceListName = sourceList.attrib['source']
news_sources[sourceListName] = []
for feed in sourceList:
feedName = feed.text
news_sources[sourceListName].append(feedName)
pprint(news_sources)
来源:https://stackoverflow.com/questions/25007042/generating-nested-lists-from-xml-doc