.htaccess for SEO bots crawling single page applications without hashbangs

后端 未结 4 1678
后悔当初
后悔当初 2021-01-12 09:21

Using a pushState enabled page, normally you redirect SEO bots using the escaped_fragment convention. You can read more about that here

相关标签:
4条回答
  • 2021-01-12 09:41

    I'm using Symfony2, and although I'm told by other devs that Googlebot and Bingbot execute Javascript well enough to generate their own HTML snippets, I don't feel confident. I also feel that serving static resources is a better alternative for ppl running with JS turned off (however unlikely that is) and so am interested in serving HTML snippets anyway, so long as it's not a hassle. Below is a method I'm thinking of using but haven't tried:

    Here are other SO questions that are similar (one is mine).
    Angularjs vs SEO vs pushState
    HTML snippets for AngularJS app that uses pushState?

    Here's a solution I posted in that question and am considering for myself in case I want to send HTML snippets to bots. This would be a solution for a Symfony2 backend:

    1. Use prerender or another service to generate static snippets of all your pages. Store them somewhere accessible by your router.
    2. In your Symfony2 routing file, create a route that matches your SPA. I have a test SPA running at localhost.com/ng-test/, so my route would look like this:

      # Adding a trailing / to this route breaks it. Not sure why.
      # This is also not formatting correctly in StackOverflow. This is yaml.
      NgTestReroute:
      ----path: /ng-test/{one}/{two}/{three}/{four}
      ----defaults:
      --------_controller: DriverSideSiteBundle:NgTest:ngTestReroute
      --------'one': null
      --------'two': null
      --------'three': null
      --------'four': null
      ----methods: [GET]

    3. In your Symfony2 controller, check user-agent to see if it's googlebot or bingbot. You should be able to do this with the code below, and then use this list to target the bots you're interested in (http://www.searchenginedictionary.com/spider-names.shtml)...

      if(strstr(strtolower($_SERVER['HTTP_USER_AGENT']), "googlebot"))
      {
      // what to do
      }

    4. If your controller finds a match to a bot, send it the HTML snippet. Otherwise, as in the case with my AngularJS app, just send the user to the index page and Angular will correctly do the rest.

    Also, if your question been answered please select one so I and others can tell what worked for you.

    0 讨论(0)
  • 2021-01-12 09:49

    I'm using PhantomJS to generate static snapshots of my pages. My directory structure is only one level deep (root and /projects), so I have two .htaccess files, in which I redirect to a PHP file (index-bots.php) that starts a PhantomJS process pointed at my SPA index.html and prints out the rendered static pages.

    The .htaccess files look like this:

    /.htaccess

    # redirect search engine bots to index-bots.php
    # in order to serve rendered HTML via phantomjs
    RewriteCond %{HTTP_USER_AGENT} (bot|crawl|slurp|spider) [NC]
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_URI} !^/index-bots\.php [NC]
    RewriteRule ^(.*)$ index-bots.php?url=%{REQUEST_URI} [L,QSA]
    

    /projects/.htaccess

    # redirect search engine bots to index-bots.php
    # in order to serve rendered HTML via phantomjs
    RewriteCond %{HTTP_USER_AGENT} (bot|crawl|slurp|spider) [NC]
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteRule ^(.*)$ ../index-bots.php?url=%{REQUEST_URI} [L,QSA]
    

    A couple of notes:

    • The !-f RewriteCond is critical! Since .htaccess will apply RewriteRules to all requests, assets on your page will each be rewritten to the PHP file, spinning up multiple instances of PhantomJS and bringing your server to its knees.
    • It's also important to exempt index-bots.php from the rewrites to avoid an endless loop.
    • I strip out the JS in my PhantomJS runner script, to ensure the JS doesn't do anything when bots that support it come across the 'static' pages.
    • I'm no .htaccess wizard, so there's probably a better way to do this. I'd love to hear it if so.
    0 讨论(0)
  • 2021-01-12 09:50

    Had a similar problem on a single page web app.

    The only solution I found to this problem was effectively creating static versions of pages for the purpose of making something navigable by the Google (and other) bots.

    You could do this yourself, but there are also services that do exactly this and create your static cache for you (and serve up the snapshots to the bots over their CDN).

    I ended up using SEO4Ajax, although other similar services are available!

    0 讨论(0)
  • 2021-01-12 10:02

    I was having the exact same problem. For now, I've modified .htaccess like so:

    RewriteCond %{QUERY_STRING} ^_escaped_fragment_=(.*)$
    RewriteRule ^$ /snapshots/index.html? [L,NC]
    RewriteCond %{QUERY_STRING} ^_escaped_fragment_=(.*)$
    RewriteRule ^(.*)$ /snapshots/$1.html? [L,NC]
    

    Not sure if there's a better solution, but it's working for me so far. Just be sure to have the directory structure for your snapshots match the URL structure.

    0 讨论(0)
提交回复
热议问题