.htaccess for SEO bots crawling single page applications without hashbangs

后端 未结 4 1677
后悔当初
后悔当初 2021-01-12 09:21

Using a pushState enabled page, normally you redirect SEO bots using the escaped_fragment convention. You can read more about that here

4条回答
  •  孤街浪徒
    2021-01-12 09:49

    I'm using PhantomJS to generate static snapshots of my pages. My directory structure is only one level deep (root and /projects), so I have two .htaccess files, in which I redirect to a PHP file (index-bots.php) that starts a PhantomJS process pointed at my SPA index.html and prints out the rendered static pages.

    The .htaccess files look like this:

    /.htaccess

    # redirect search engine bots to index-bots.php
    # in order to serve rendered HTML via phantomjs
    RewriteCond %{HTTP_USER_AGENT} (bot|crawl|slurp|spider) [NC]
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_URI} !^/index-bots\.php [NC]
    RewriteRule ^(.*)$ index-bots.php?url=%{REQUEST_URI} [L,QSA]
    

    /projects/.htaccess

    # redirect search engine bots to index-bots.php
    # in order to serve rendered HTML via phantomjs
    RewriteCond %{HTTP_USER_AGENT} (bot|crawl|slurp|spider) [NC]
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteRule ^(.*)$ ../index-bots.php?url=%{REQUEST_URI} [L,QSA]
    

    A couple of notes:

    • The !-f RewriteCond is critical! Since .htaccess will apply RewriteRules to all requests, assets on your page will each be rewritten to the PHP file, spinning up multiple instances of PhantomJS and bringing your server to its knees.
    • It's also important to exempt index-bots.php from the rewrites to avoid an endless loop.
    • I strip out the JS in my PhantomJS runner script, to ensure the JS doesn't do anything when bots that support it come across the 'static' pages.
    • I'm no .htaccess wizard, so there's probably a better way to do this. I'd love to hear it if so.

提交回复
热议问题