Cannot locate displayed data in source code when Scraping with Scrapy

后端未结

关注

 1  1877

I am using Python.org version 2.7 64 bit on Windows Vista 64 bit. I am using a combination of Scrapy and regex to extract information from a Javascript item called \'DataStore.P

相关标签:

1条回答

执念已碎

2021-01-25 05:42

There is an XHR request going to load the fixtures. Simulate it and get the data.

For example, fixtures for Jan 2014:

from ast import literal_eval
from datetime import datetime
import requests

date = datetime(year=2014, month=1, day=1)
url = 'http://www.whoscored.com/tournamentsfeed/8273/Fixtures/'

params = {'d': date.strftime('%Y%m'), 'isAggregate': 'false'}
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36'}

response = requests.get(url, params=params, headers=headers)

fixtures = literal_eval(response.content)
print fixtures

Prints:

[
    [789692, 1, 'Saturday, Jan 4 2014', '12:45', 158, 'Blackburn', 0, 167, 'Manchester City', 1, '1 : 1', '0 : 1', 1, 1, 'FT', '0', 0, 0, 4, 1], 
    [789693, 1, 'Saturday, Jan 4 2014', '15:00', 31, 'Everton', 0, 171, 'Queens Park Rangers', 0, '4 : 0', '2 : 0', 1, 0, 'FT', '1', 0, 0, 1, 0],
    ...
]

Note that the response is not a json, but a basically a dump of Python's list of lists, you can load it with ast.literal_eval():

Safely evaluate an expression node or a Unicode or Latin-1 encoded string containing a Python expression. The string or node provided may only consist of the following Python literal structures: strings, numbers, tuples, lists, dicts, booleans, and None.

0 讨论(0)