Python XML Parse and getElementsByTagName

后端 未结 3 633
感动是毒
感动是毒 2021-01-21 06:05

I was trying to parse the following xml and fetch specific tags that i\'m interested in around my business need. and i guess i\'m doing something wrong. Not sure how to parse my

3条回答
  •  囚心锁ツ
    2021-01-21 06:25

    Assuming no issue with parsing your XML from URL (since link is not available on our end), your first lxml can work if you parse on actual nodes. Specifically, there is no node in XML document.

    Instead use link. And consider a nested list/dict comprehension to migrate content to a data frame. For lxml you can swap out findall and xpath to return same result.

    df = pd.DataFrame([{item.tag: item.text if item.text.strip() != "" else item.find("*").text
                           for item in lnk.findall("*") if item is not None} 
                           for lnk in root.findall('.//link')])
                           
    print(df)
    #   categories  promotiontypes                                   offerdescription  ... advertiserid advertisername     network
    # 0    Apparel  Percentage off  25% Off Boys Quiksilver Apparel. Shop now at M...  ...         3184        cys.com  US Network
    # 1    Apparel  Percentage off  25% Off Boys' Quiksilver Apparel. Shop now at ...  ...         3184        cys.com  US Network
    

提交回复
热议问题