Python XML Parse and getElementsByTagName

后端 未结 3 639
感动是毒
感动是毒 2021-01-21 06:05

I was trying to parse the following xml and fetch specific tags that i\'m interested in around my business need. and i guess i\'m doing something wrong. Not sure how to parse my

3条回答
  •  悲&欢浪女
    2021-01-21 06:11

    Another method.

    from simplified_scrapy import SimplifiedDoc, utils, req
    # html = req.get('http://couponfeed.synergy.com/coupon?token=xxxxxxxxx122b&network=1&resultsperpage=500')
    html = '''
    
     1459
     3
     1
     
      
       Apparel
      
      
        Percentage off
       
       25% Off Boys Quiksilver Apparel. Shop now at Macys.com! Valid 7/23 through 7/25!
       2020-07-24
       2020-07-26
       https://click.synergy.com/fs-bin/click?id=Z&offerid=777210.100474694&type=3&subid=0
        https://ad.synergy.com/fs-bin/show?id=ZNAweM&bids=777210.100474694&type=3&subid=0
        3184
        cys.com
        US Network
      
     
    '''
    doc = SimplifiedDoc(html)
    df_cols = [
        "promotiontype", "category", "offerdescription", "offerstartdate",
        "offerenddate", "clickurl", "impressionpixel", "advertisername", "network"
    ]
    rows = [df_cols]
    
    links = doc.couponfeed.links  # Get all links
    for link in links:
        row = []
        for col in df_cols:
            row.append(link.select(col).text)  # Get col text
        rows.append(row)
    
    utils.save2csv('merchants_offers_share.csv', rows)  # Save to csv file
    

    Result:

    promotiontype,category,offerdescription,offerstartdate,offerenddate,clickurl,impressionpixel,advertisername,network
    Percentage off,Apparel,25% Off Boys Quiksilver Apparel. Shop now at Macys.com! Valid 7/23 through 7/25!,2020-07-24,2020-07-26,https://click.synergy.com/fs-bin/click?id=Z&offerid=777210.100474694&type=3&subid=0,https://ad.synergy.com/fs-bin/show?id=ZNAweM&bids=777210.100474694&type=3&subid=0,cys.com,US Network
    

    Here are more examples: https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples

    Remove the last empty row

    import io
    with io.open('merchants_offers_share.csv', "rb+") as f:
        f.seek(-1,2)
        l = f.read()
        if l == b"\n":
            f.seek(-2,2)
            f.truncate()
    

提交回复
热议问题