I´m getting JavaScript code instead of rendered html content with scrapy-splash

二次信任 提交于 2020-07-03 17:30:08

问题


I´m trying to use scrapy-splash to load a javascript based page to get the rendered html content of the page but all I get is javascript code as a response. Why doesn´t my spider execute the javascript code of the page?

this are my scrapy settings:

SPLASH_URL = 'http://localhost:8050'

DOWNLOADER_MIDDLEWARES = {
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
    'scrapy_user_agents.middlewares.RandomUserAgentMiddleware': 400,
}

SPIDER_MIDDLEWARES = {
    'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}

DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'
ROBOTSTXT_OBEY = True
COOKIES_ENABLED = True

this is my spider:

class MySpider(Spider):

    name = 'AbcSpider'
    start_urls = ['https://www.xtip.de/de/fussball/deutschland/bundesliga']

    def start_requests(self):
        for url in self.start_urls:
            yield SplashRequest(url = url, callback = self.parse, args = {"wait" : 5, 'timeout': 90, 'images': 0, 'resource_timeout': 10})

    def parse(self, response):
        yield print(response.text)

and this is the response:

<!DOCTYPE html><html lang="en"><head>
    <meta charset="utf-8">
    <title>XTiP Sportwetten</title>
    <base href="/">
    <meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1, user-scalable=0">
    <meta name="google" content="notranslate">
    <!-- Do not edit the brand path manually - it is dynamically replaced by the `gulp whitelabel` task. -->
    <link rel="shortcut icon" href="/assets/images/brands/xtip/favicon.ico">
    <link rel="stylesheet" href="/assets/scripts/avvpl-player/style.css">
    <link rel="apple-touch-icon" sizes="180x180" href="/assets/images/brands/xtip/appletouchicon.png">
    <link rel="manifest" href="/assets/images/brands/xtip/site.webmanifest">
    <link rel="mask-icon" href="/assets/images/brands/xtip/safari-pinned-tab.svg" color="#000000">
    <meta name="msapplication-config" content="/assets/images/brands/xtip/browserconfig.xml">
<link rel="stylesheet" href="styles.35d77b28a6b7e91567f3.css"><script async="" defer="" type="text/javascript" src="/assets/scripts/avvpl-player/avvpl-player.js"></script></head>
<body style="margin:0; padding:0;" class="lock-scroll">
    <app>
        <div class="loading-indicator loading-indicator--fullscreen">
            <div class="loading-indicator__indicator"></div>
        </div>
    </app>
    <script>
        var playerTimeout = null;
        // load live player script on non-IE browsers
        window.addEventListener('load', function() {
            var livePlayerScript = document.createElement('script');
            livePlayerScript.async = true;
            livePlayerScript.defer = true;
            livePlayerScript.setAttribute('type', 'text/javascript');
            livePlayerScript.setAttribute('src', '/assets/scripts/avvpl-player/avvpl-player.js');
            setTimeout(function () {
                playerTimeout = document.getElementsByTagName('head')[0].appendChild(livePlayerScript);
            }, 1500);
        });
        window.addEventListener('beforeunload', function() {
            if (playerTimeout) {
                clearTimeout(playerTimeout);
                playerTimeout = null;
            }
        });
        // Init Google Tag Manager
        initGtm = function(gtmId) {
            (function(w, d, s, l, i) {
                w[l] = w[l] || [];
                w[l].push({
                    'gtm.start': new Date().getTime(),
                    event: 'gtm.js'
                });
                var f = d.getElementsByTagName("head")[0].firstChild,
                    j = d.createElement(s),
                    dl = l != 'dataLayer' ? '&l=' + l : '';
                j.async = true;
                j.defer = true;
                j.src =
                    'https://www.googletagmanager.com/gtm.js?id=' + i + dl;
                f.parentNode.insertBefore(j, f);
            })(window, document, 'script', 'dataLayer', gtmId);
        }
    </script>
<script type="text/javascript" src="runtime.f814f4a94e7a2798c806.js"></script><script type="text/javascript" src="polyfills.7f53e67987a16d32c646.js"></script><script type="text/javascript" src="vendor.8868912c3b507f35b3dd.js"></script><script type="text/javascript" src="main.b109b301d1c74567b8fa.js"></script>

</body></html>

I read already so many posts here but didn´t find any issue where the whole javascript code was recieved as response.

来源:https://stackoverflow.com/questions/60646505/i%c2%b4m-getting-javascript-code-instead-of-rendered-html-content-with-scrapy-splash

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!