I was trying to parse the following xml and fetch specific tags that i\'m interested in around my business need. and i guess i\'m doing something wrong. Not sure how to parse my
Another method.
from simplified_scrapy import SimplifiedDoc, utils, req
# html = req.get('http://couponfeed.synergy.com/coupon?token=xxxxxxxxx122b&network=1&resultsperpage=500')
html = '''
1459
3
1
Apparel
Percentage off
25% Off Boys Quiksilver Apparel. Shop now at Macys.com! Valid 7/23 through 7/25!
2020-07-24
2020-07-26
https://click.synergy.com/fs-bin/click?id=Z&offerid=777210.100474694&type=3&subid=0
https://ad.synergy.com/fs-bin/show?id=ZNAweM&bids=777210.100474694&type=3&subid=0
3184
cys.com
US Network
'''
doc = SimplifiedDoc(html)
df_cols = [
"promotiontype", "category", "offerdescription", "offerstartdate",
"offerenddate", "clickurl", "impressionpixel", "advertisername", "network"
]
rows = [df_cols]
links = doc.couponfeed.links # Get all links
for link in links:
row = []
for col in df_cols:
row.append(link.select(col).text) # Get col text
rows.append(row)
utils.save2csv('merchants_offers_share.csv', rows) # Save to csv file
Result:
promotiontype,category,offerdescription,offerstartdate,offerenddate,clickurl,impressionpixel,advertisername,network
Percentage off,Apparel,25% Off Boys Quiksilver Apparel. Shop now at Macys.com! Valid 7/23 through 7/25!,2020-07-24,2020-07-26,https://click.synergy.com/fs-bin/click?id=Z&offerid=777210.100474694&type=3&subid=0,https://ad.synergy.com/fs-bin/show?id=ZNAweM&bids=777210.100474694&type=3&subid=0,cys.com,US Network
Here are more examples: https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples
Remove the last empty row
import io
with io.open('merchants_offers_share.csv', "rb+") as f:
f.seek(-1,2)
l = f.read()
if l == b"\n":
f.seek(-2,2)
f.truncate()