Web Scraper for dynamic forms in python

前端 未结 2 1299
無奈伤痛
無奈伤痛 2021-01-07 00:25

I am trying to fill the form of this website http://www.marutisuzuki.com/Maruti-Price.aspx.

It consists of three drop down lists. One is Model of the car, Second is

相关标签:
2条回答
  • 2021-01-07 00:48

    If you look at the request being sent to that site in developer tools, you'll see that a POST is sent as soon as you select a state. The response that is sent back has the form with the values in the city dropdown populated.

    So, to replicate this in your script you want something like the following:

    • Open the page
    • Select the form
    • Select values for model and state
    • Submit the form
    • Select the form from the response sent back
    • Select value for city (it should be populated now)
    • Submit the form
    • Parse the response for the table of results

    That will look something like:

    #!/usr/bin/env python                                                                                                                                                                
    
    import re
    import mechanize
    
    from bs4 import BeautifulSoup
    
    def select_form(form):
        return form.attrs.get('id', None) == 'form1'
    
    def get_state_items(browser):
        browser.select_form(predicate=select_form)
        ctl = browser.form.find_control('ctl00$ContentPlaceHolder1$ddlState')
        state_items = ctl.get_items()
        return state_items[1:]
    
    def get_city_items(browser):
        browser.select_form(predicate=select_form)
        ctl = browser.form.find_control('ctl00$ContentPlaceHolder1$ddlCity')
        city_items = ctl.get_items()
        return city_items[1:]
    
    br = mechanize.Browser()
    br.open('http://www.marutisuzuki.com/Maruti-Price.aspx')    
    br.select_form(predicate=select_form)
    br.form['ctl00$ContentPlaceHolder1$ddlmodel'] = ['AK'] # model = Maruti Suzuki Alto K10                                                                                              
    
    for state in get_state_items(br):
        # 1 - Submit form for state.name to get cities for this state                                                                                                                    
        br.select_form(predicate=select_form)
        br.form['ctl00$ContentPlaceHolder1$ddlState'] = [ state.name ]
        br.submit()
    
        # 2 - Now the city dropdown is filled for state.name                                                                                                                             
        for city in get_city_items(br):
            br.select_form(predicate=select_form)
            br.form['ctl00$ContentPlaceHolder1$ddlCity'] = [ city.name ]
            br.submit()
    
            s = BeautifulSoup(br.response().read())
            t = s.find('table', id='ContentPlaceHolder1_dtDealer')
            r = re.compile(r'^ContentPlaceHolder1_dtDealer_lblName_\d+$')
    
            header_printed = False
            for p in t.findAll('span', id=r):
                tr = p.findParent('tr')
                td = tr.findAll('td')
    
                if header_printed is False:
                    str = '%s, %s' % (city.attrs['label'], state.attrs['label'])
                    print str
                    print '-' * len(str)
                    header_printed = True
    
                print ' '.join(['%s' % x.text.strip() for x in td])
    
    0 讨论(0)
  • 2021-01-07 00:48

    I had the same issue with the tutorial, and this worked for me:

    item = mechanize.Item(br.form.find_control(name='searchAuxCountryID'),{'contents': '3', 'value': '3', 'label': 3})
    
    0 讨论(0)
提交回复
热议问题