Cannot locate displayed data in source code when Scraping with Scrapy

后端 未结 1 1877
星月不相逢
星月不相逢 2021-01-25 05:38

I am using Python.org version 2.7 64 bit on Windows Vista 64 bit. I am using a combination of Scrapy and regex to extract information from a Javascript item called \'DataStore.P

相关标签:
1条回答
  • 2021-01-25 05:42

    There is an XHR request going to load the fixtures. Simulate it and get the data.

    For example, fixtures for Jan 2014:

    from ast import literal_eval
    from datetime import datetime
    import requests
    
    date = datetime(year=2014, month=1, day=1)
    url = 'http://www.whoscored.com/tournamentsfeed/8273/Fixtures/'
    
    params = {'d': date.strftime('%Y%m'), 'isAggregate': 'false'}
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36'}
    
    response = requests.get(url, params=params, headers=headers)
    
    fixtures = literal_eval(response.content)
    print fixtures
    

    Prints:

    [
        [789692, 1, 'Saturday, Jan 4 2014', '12:45', 158, 'Blackburn', 0, 167, 'Manchester City', 1, '1 : 1', '0 : 1', 1, 1, 'FT', '0', 0, 0, 4, 1], 
        [789693, 1, 'Saturday, Jan 4 2014', '15:00', 31, 'Everton', 0, 171, 'Queens Park Rangers', 0, '4 : 0', '2 : 0', 1, 0, 'FT', '1', 0, 0, 1, 0],
        ...
    ]
    

    Note that the response is not a json, but a basically a dump of Python's list of lists, you can load it with ast.literal_eval():

    Safely evaluate an expression node or a Unicode or Latin-1 encoded string containing a Python expression. The string or node provided may only consist of the following Python literal structures: strings, numbers, tuples, lists, dicts, booleans, and None.

    0 讨论(0)
提交回复
热议问题