scraping a response from a selected option in dropdown list

后端 未结 2 986
独厮守ぢ
独厮守ぢ 2020-12-09 14:02

This is an example of a page that lists baseball stats for a selected player, defaulting to the most recent year (2014, soon to be 2015) http://www.koreabaseball.com/Record/

相关标签:
2条回答
  • 2020-12-09 14:33

    An example using Mechanize and Ruby. Modify the form field and submit.

    #!/usr/bin/env ruby
    
    require 'mechanize'
    
    agent = Mechanize.new{ |agent| agent.history.max_size=0 }
    
    agent.user_agent = 'Mozilla/5.0'
    
    url = "http://www.koreabaseball.com/Record/Player/HitterDetail/Game.aspx?playerId=76325"
    
    page = agent.get(url)
    
    form = page.forms[0]
    
    p form['ctl00$ctl00$cphContainer$cphContents$ddlYear']
    
    form['ctl00$ctl00$cphContainer$cphContents$ddlYear'] = 2013
    
    page = form.submit
    
    form = page.forms[0]
    
    p form['ctl00$ctl00$cphContainer$cphContents$ddlYear']
    
    0 讨论(0)
  • 2020-12-09 14:50

    Do it in two steps:

    • make a GET request, parse HTML and extract the form input values
    • make a POST request parsing input values alongside with ctl00$ctl00$cphContainer$cphContents$ddlYear parameter which is responsible for the year

    Implementation example for year 2013 (using requests and BeautifulSoup):

    from bs4 import BeautifulSoup
    import requests
    
    url = 'http://www.koreabaseball.com/Record/Player/HitterDetail/Game.aspx?playerId=76325'
    
    with requests.Session() as session:
        session.headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36'}
    
        # parsing parameters
        response = session.get(url)
        soup = BeautifulSoup(response.content)
    
        data = {
            'ctl00$ctl00$cphContainer$cphContents$ddlYear': '2013',
            'ctl00$ctl00$txtSearchWord': '',
            '__EVENTTARGET': soup.find('input', {'name': '__EVENTTARGET'}).get('value', ''),
            '__EVENTARGUMENT': soup.find('input', {'name': '__EVENTARGUMENT'}).get('value', ''),
            '__LASTFOCUS': soup.find('input', {'name': '__LASTFOCUS'}).get('value', ''),
            '__VIEWSTATE': soup.find('input', {'name': '__VIEWSTATE'}).get('value', ''),
            '__VIEWSTATEGENERATOR': soup.find('input', {'name': '__VIEWSTATEGENERATOR'}).get('value', ''),
            '__EVENTVALIDATION': soup.find('input', {'name': '__EVENTVALIDATION'}).get('value', ''),
        }
    
        # parsing data
        response = session.post(url, data=data)
    
        soup = BeautifulSoup(response.content)
    
        for row in soup.select('table.tData01 tr'):
            print [td.text for td in row.find_all('td')]
    

    This prints the contents of all stats tables for 2013:

    [u'KIA', u'16', u'0.364', u'55', u'8', u'20', u'3', u'0', u'3', u'11', u'5', u'0', u'14', u'0', u'14', u'1']
    [u'LG', u'15', u'0.321', u'53', u'7', u'17', u'1', u'0', u'2', u'9', u'1', u'1', u'6', u'3', u'10', u'2']
    [u'NC', u'16', u'0.237', u'59', u'5', u'14', u'2', u'0', u'2', u'10', u'2', u'0', u'3', u'0', u'17', u'2']
    [u'SK', u'16', u'0.235', u'51', u'7', u'12', u'1', u'0', u'3', u'13', u'1', u'3', u'13', u'1', u'12', u'4']
    [u'\ub450\uc0b0', u'16', u'0.368', u'57', u'16', u'21', u'2', u'1', u'4', u'21', u'2', u'1', u'12', u'0', u'13', u'2']
    [u'\ub86f\ub370', u'16', u'0.375', u'56', u'9', u'21', u'4', u'0', u'3', u'13', u'4', u'3', u'11', u'0', u'9', u'3']
    [u'\uc0bc\uc131', u'16', u'0.226', u'62', u'8', u'14', u'5', u'0', u'3', u'10', u'0', u'0', u'8', u'1', u'15', u'1']
    [u'\ud55c\ud654', u'15', u'0.211', u'57', u'7', u'12', u'3', u'0', u'2', u'9', u'0', u'0', u'1', u'1', u'19', u'3']
    ...
    
    0 讨论(0)
提交回复
热议问题