问题
I am in the process of learning and simultaneously building a web spider using scrapy. I need help with extracting some information from the following javascript code:
<script language="JavaScript" type="text/javascript+gk-onload">
SKART = (SKART) ? SKART : {};
SKART.analytics = SKART.analytics || {};
SKART.analytics["category"] = "television";
SKART.analytics["vertical"] = "television";
SKART.analytics["supercategory"] = "homeentertainmentlarge";
SKART.analytics["subcategory"] = "television";
</script>
I wish to extract the category information as television using Xpath. Please help me with the selectors I should use.
回答1:
You can use the Selector
's built-in support for regular expressions through re():
pattern = r'SKART\.analytics\["category"\] = "(\w+)";'
response.xpath('//script[@type="text/javascript+gk-onload"]').re(pattern)
Demo (using scrapy shell):
$ scrapy shell index.html
In [1]: pattern = r'SKART\.analytics\["category"\] = "(\w+)";'
In [2]: response.xpath('//script[@type="text/javascript+gk-onload"]').re(pattern)
Out[2]: [u'television']
来源:https://stackoverflow.com/questions/29163395/scrapy-and-xpath-to-extract-data-from-javascript-code