I was trying to parse the following xml and fetch specific tags that i\'m interested in around my business need. and i guess i\'m doing something wrong. Not sure how to parse my
Assuming no issue with parsing your XML from URL (since link is not available on our end), your first lxml
can work if you parse on actual nodes. Specifically, there is no
node in XML document.
Instead use link
. And consider a nested list/dict comprehension to migrate content to a data frame. For lxml
you can swap out findall
and xpath
to return same result.
df = pd.DataFrame([{item.tag: item.text if item.text.strip() != "" else item.find("*").text
for item in lnk.findall("*") if item is not None}
for lnk in root.findall('.//link')])
print(df)
# categories promotiontypes offerdescription ... advertiserid advertisername network
# 0 Apparel Percentage off 25% Off Boys Quiksilver Apparel. Shop now at M... ... 3184 cys.com US Network
# 1 Apparel Percentage off 25% Off Boys' Quiksilver Apparel. Shop now at ... ... 3184 cys.com US Network