Python XML Parse and getElementsByTagName

后端未结

关注

 3  636

感动是毒 2021-01-21 06:05

I was trying to parse the following xml and fetch specific tags that i\'m interested in around my business need. and i guess i\'m doing something wrong. Not sure how to parse my

3条回答

囚心锁ツ (楼主)

2021-01-21 06:25

Assuming no issue with parsing your XML from URL (since link is not available on our end), your first lxml can work if you parse on actual nodes. Specifically, there is no node in XML document.

Instead use link. And consider a nested list/dict comprehension to migrate content to a data frame. For lxml you can swap out findall and xpath to return same result.

df = pd.DataFrame([{item.tag: item.text if item.text.strip() != "" else item.find("*").text
                       for item in lnk.findall("*") if item is not None} 
                       for lnk in root.findall('.//link')])
                       
print(df)
#   categories  promotiontypes                                   offerdescription  ... advertiserid advertisername     network
# 0    Apparel  Percentage off  25% Off Boys Quiksilver Apparel. Shop now at M...  ...         3184        cys.com  US Network
# 1    Apparel  Percentage off  25% Off Boys' Quiksilver Apparel. Shop now at ...  ...         3184        cys.com  US Network

0 讨论(0)

查看其它3个回答