Python XML Parse and getElementsByTagName

后端未结

关注

 3  639

感动是毒 2021-01-21 06:05

I was trying to parse the following xml and fetch specific tags that i\'m interested in around my business need. and i guess i\'m doing something wrong. Not sure how to parse my

3条回答

悲&欢浪女 (楼主)

2021-01-21 06:11

Another method.

from simplified_scrapy import SimplifiedDoc, utils, req
# html = req.get('http://couponfeed.synergy.com/coupon?token=xxxxxxxxx122b&network=1&resultsperpage=500')
html = '''

 1459
 3
 1
 
  
   Apparel
  
  
    Percentage off
   
   25% Off Boys Quiksilver Apparel. Shop now at Macys.com! Valid 7/23 through 7/25!
   2020-07-24
   2020-07-26
   https://click.synergy.com/fs-bin/click?id=Z&offerid=777210.100474694&type=3&subid=0
    https://ad.synergy.com/fs-bin/show?id=ZNAweM&bids=777210.100474694&type=3&subid=0
    3184
    cys.com
    US Network
  
 
'''
doc = SimplifiedDoc(html)
df_cols = [
    "promotiontype", "category", "offerdescription", "offerstartdate",
    "offerenddate", "clickurl", "impressionpixel", "advertisername", "network"
]
rows = [df_cols]

links = doc.couponfeed.links  # Get all links
for link in links:
    row = []
    for col in df_cols:
        row.append(link.select(col).text)  # Get col text
    rows.append(row)

utils.save2csv('merchants_offers_share.csv', rows)  # Save to csv file

Result:

promotiontype,category,offerdescription,offerstartdate,offerenddate,clickurl,impressionpixel,advertisername,network
Percentage off,Apparel,25% Off Boys Quiksilver Apparel. Shop now at Macys.com! Valid 7/23 through 7/25!,2020-07-24,2020-07-26,https://click.synergy.com/fs-bin/click?id=Z&offerid=777210.100474694&type=3&subid=0,https://ad.synergy.com/fs-bin/show?id=ZNAweM&bids=777210.100474694&type=3&subid=0,cys.com,US Network

Here are more examples: https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples

Remove the last empty row

import io
with io.open('merchants_offers_share.csv', "rb+") as f:
    f.seek(-1,2)
    l = f.read()
    if l == b"\n":
        f.seek(-2,2)
        f.truncate()

0 讨论(0)

查看其它3个回答