问题
I´m trying to use scrapy-splash to load a javascript based page to get the rendered html content of the page but all I get is javascript code as a response. Why doesn´t my spider execute the javascript code of the page?
this are my scrapy settings:
SPLASH_URL = 'http://localhost:8050'
DOWNLOADER_MIDDLEWARES = {
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
'scrapy_user_agents.middlewares.RandomUserAgentMiddleware': 400,
}
SPIDER_MIDDLEWARES = {
'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'
ROBOTSTXT_OBEY = True
COOKIES_ENABLED = True
this is my spider:
class MySpider(Spider):
name = 'AbcSpider'
start_urls = ['https://www.xtip.de/de/fussball/deutschland/bundesliga']
def start_requests(self):
for url in self.start_urls:
yield SplashRequest(url = url, callback = self.parse, args = {"wait" : 5, 'timeout': 90, 'images': 0, 'resource_timeout': 10})
def parse(self, response):
yield print(response.text)
and this is the response:
<!DOCTYPE html><html lang="en"><head>
<meta charset="utf-8">
<title>XTiP Sportwetten</title>
<base href="/">
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1, user-scalable=0">
<meta name="google" content="notranslate">
<!-- Do not edit the brand path manually - it is dynamically replaced by the `gulp whitelabel` task. -->
<link rel="shortcut icon" href="/assets/images/brands/xtip/favicon.ico">
<link rel="stylesheet" href="/assets/scripts/avvpl-player/style.css">
<link rel="apple-touch-icon" sizes="180x180" href="/assets/images/brands/xtip/appletouchicon.png">
<link rel="manifest" href="/assets/images/brands/xtip/site.webmanifest">
<link rel="mask-icon" href="/assets/images/brands/xtip/safari-pinned-tab.svg" color="#000000">
<meta name="msapplication-config" content="/assets/images/brands/xtip/browserconfig.xml">
<link rel="stylesheet" href="styles.35d77b28a6b7e91567f3.css"><script async="" defer="" type="text/javascript" src="/assets/scripts/avvpl-player/avvpl-player.js"></script></head>
<body style="margin:0; padding:0;" class="lock-scroll">
<app>
<div class="loading-indicator loading-indicator--fullscreen">
<div class="loading-indicator__indicator"></div>
</div>
</app>
<script>
var playerTimeout = null;
// load live player script on non-IE browsers
window.addEventListener('load', function() {
var livePlayerScript = document.createElement('script');
livePlayerScript.async = true;
livePlayerScript.defer = true;
livePlayerScript.setAttribute('type', 'text/javascript');
livePlayerScript.setAttribute('src', '/assets/scripts/avvpl-player/avvpl-player.js');
setTimeout(function () {
playerTimeout = document.getElementsByTagName('head')[0].appendChild(livePlayerScript);
}, 1500);
});
window.addEventListener('beforeunload', function() {
if (playerTimeout) {
clearTimeout(playerTimeout);
playerTimeout = null;
}
});
// Init Google Tag Manager
initGtm = function(gtmId) {
(function(w, d, s, l, i) {
w[l] = w[l] || [];
w[l].push({
'gtm.start': new Date().getTime(),
event: 'gtm.js'
});
var f = d.getElementsByTagName("head")[0].firstChild,
j = d.createElement(s),
dl = l != 'dataLayer' ? '&l=' + l : '';
j.async = true;
j.defer = true;
j.src =
'https://www.googletagmanager.com/gtm.js?id=' + i + dl;
f.parentNode.insertBefore(j, f);
})(window, document, 'script', 'dataLayer', gtmId);
}
</script>
<script type="text/javascript" src="runtime.f814f4a94e7a2798c806.js"></script><script type="text/javascript" src="polyfills.7f53e67987a16d32c646.js"></script><script type="text/javascript" src="vendor.8868912c3b507f35b3dd.js"></script><script type="text/javascript" src="main.b109b301d1c74567b8fa.js"></script>
</body></html>
I read already so many posts here but didn´t find any issue where the whole javascript code was recieved as response.
来源:https://stackoverflow.com/questions/60646505/i%c2%b4m-getting-javascript-code-instead-of-rendered-html-content-with-scrapy-splash