I am using Python.org version 2.7 64 bit on Windows Vista 64 bit. I am using a combination of Scrapy and regex to extract information from a Javascript item called \'DataStore.P
There is an XHR
request going to load the fixtures. Simulate it and get the data.
For example, fixtures for Jan 2014
:
from ast import literal_eval
from datetime import datetime
import requests
date = datetime(year=2014, month=1, day=1)
url = 'http://www.whoscored.com/tournamentsfeed/8273/Fixtures/'
params = {'d': date.strftime('%Y%m'), 'isAggregate': 'false'}
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36'}
response = requests.get(url, params=params, headers=headers)
fixtures = literal_eval(response.content)
print fixtures
Prints:
[
[789692, 1, 'Saturday, Jan 4 2014', '12:45', 158, 'Blackburn', 0, 167, 'Manchester City', 1, '1 : 1', '0 : 1', 1, 1, 'FT', '0', 0, 0, 4, 1],
[789693, 1, 'Saturday, Jan 4 2014', '15:00', 31, 'Everton', 0, 171, 'Queens Park Rangers', 0, '4 : 0', '2 : 0', 1, 0, 'FT', '1', 0, 0, 1, 0],
...
]
Note that the response is not a json, but a basically a dump of Python's list of lists, you can load it with ast.literal_eval():
Safely evaluate an expression node or a Unicode or Latin-1 encoded string containing a Python expression. The string or node provided may only consist of the following Python literal structures: strings, numbers, tuples, lists, dicts, booleans, and None.