googlebot | 易学教程

How to set up a robot.txt which only allows the default page of a site

阅读更多关于 How to set up a robot.txt which only allows the default page of a site

问题 Say I have a site on http://example.com. I would really like allowing bots to see the home page, but any other page need to blocked as it is pointless to spider. In other words http://example.com & http://example.com/ should be allowed, but http://example.com/anything and http://example.com/someendpoint.aspx should be blocked. Further it would be great if I can allow certain query strings to passthrough to the home page: http://example.com?okparam=true but not http://example.com

Angular2 App: Fetch as Google doesn't load page content

阅读更多关于 Angular2 App: Fetch as Google doesn't load page content

I am working on Angular2 based web app. I used Angular CLI to generate app and then to build it for prod. I have hosted website on AWS S3 & Cloudfront. When I use 'Fetch as Google' tool from the webmaster, it shows only Loading... . Isn't Googlebot able to crawl my website? Krosan had a similar issue. I believe Google-Bot do not support modern JS. I simply activated all shims recommended by angular.io see https://angular.io/docs/ts/latest/guide/browser-support.html and added in the script header: <script src="https://cdnjs.cloudflare.com/ajax/libs/core-js/2.4.1/shim.min.js"> </script> If you

HTML snippets for AngularJS app that uses pushState?

阅读更多关于 HTML snippets for AngularJS app that uses pushState?

问题 I'm deciding whether it's safe to develop my client-facing app in AngularJS using pushState. I've read that when using pushState in an AngularJS app, we don't need to worry about Googlebot because it can now execute enough JS to produce an HTML snippet for itself. But then I wonder about Bing , Facebook and other bots and scrapers. The tutorials I've seen for making AngularJS SEO-friendly all deal with apps that use hashbangs (#!). These don't apply to me since I'm not using hashbangs. Does

Googlebot receiving missing template error for an existing template

阅读更多关于 Googlebot receiving missing template error for an existing template

In the last couple of days, we have started to receive a missing template error when the google bot attempts to access our main home page (welcome/index). I have been staring at this for a couple of hours and know that I am just missing something simple. A ActionView::MissingTemplate occurred in welcome#index: Missing template welcome/index with {:handlers=>[:erb, :rjs, :builder, :rhtml, :rxml, :haml], :formats=>["*/*;q=0.9"], :locale=>[:en, :en]} But the template does exist (index.html.haml). If it didn't no one could access our home page. Here is some additional environment information: *

Do I need to add nofollow rel attribute to links if the href page contains a robots meta tag containing noindex and nofollow?

阅读更多关于 Do I need to add nofollow rel attribute to links if the href page contains a robots meta tag containing noindex and nofollow?

问题 If i have a page ("dontFollowMe.html") with the meta tag: < meta name = "robots" content = "noindex, nofollow" / > ... and I link to that page ... Do I need to include the nofollow rel attribute to the a element? : <a href="dontFollowMe.html" rel="nofollow">sign in</a> Thanks 回答1: No, you don't necessarily need to use nofollow on a page that is noindexed (for technical reasons, as your question described). nofollow = "Do not pass link juice to this page. Just pretend it doesn't exist". Of

Do I need to add nofollow rel attribute to links if the href page contains a robots meta tag containing noindex and nofollow?

阅读更多关于 Do I need to add nofollow rel attribute to links if the href page contains a robots meta tag containing noindex and nofollow?

If i have a page ("dontFollowMe.html") with the meta tag: < meta name = "robots" content = "noindex, nofollow" / > ... and I link to that page ... Do I need to include the nofollow rel attribute to the a element? : <a href="dontFollowMe.html" rel="nofollow">sign in</a> Thanks No, you don't necessarily need to use nofollow on a page that is noindexed (for technical reasons, as your question described). nofollow = "Do not pass link juice to this page. Just pretend it doesn't exist". Of course, this is just a suggestion to the search engines. noindex = "Do not index this page. I don't care

Verifying Googlebot in .htaccess file

阅读更多关于 Verifying Googlebot in .htaccess file

问题 I have been investigate a bit. Will code below work? Not so easy to check. RewriteEngine on HostnameLookups Double RewriteCond %{REMOTE_HOST} (\.googlebot\.com) [NC] RewriteRule ^(.*)$ /do-something [L,R] I worry the most for part HostnameLookups Double It says in some place that works only in httpd.confg, vps, directory(not shure what this last means if not .htaccess but not saying in htaccess). Do you have knowledge about this issue? 回答1: You can use a condition with %{HTTP_USER_AGENT}

Verifying Googlebot in .htaccess file

阅读更多关于 Verifying Googlebot in .htaccess file

I have been investigate a bit. Will code below work? Not so easy to check. RewriteEngine on HostnameLookups Double RewriteCond %{REMOTE_HOST} (\.googlebot\.com) [NC] RewriteRule ^(.*)$ /do-something [L,R] I worry the most for part HostnameLookups Double It says in some place that works only in httpd.confg, vps, directory(not shure what this last means if not .htaccess but not saying in htaccess). Do you have knowledge about this issue? You can use a condition with %{HTTP_USER_AGENT} variable: RewriteEngine on RewriteCond %{HTTP_USER_AGENT} ^googlebot RewriteRule ^(.*)$ /do-something [L,R]

Googlebot doesn't see jquery generated content

阅读更多关于 Googlebot doesn't see jquery generated content

I use jQuery to retrieve content from the database with a json request. It then replaces a wildcard in the HTML (like %title%) with the actual content. This works great and this way I can maintain my multi-language texts in a database, but Googlebot only sees the wildcards, not the actual content. I know Googlebot sees pages without javascript, but is there a way to deal with this? Thanks! You should give this document at Google a thorough read. It discusses how to enable Googlebot to index: pages where content changes depending on changing #hashfragment values in the URL. pages where content

Can you deploy Watir on Heroku to generate HTML Snapshots? If so, how?

阅读更多关于 Can you deploy Watir on Heroku to generate HTML Snapshots? If so, how?

I would like to generate HTML Snapshots using Watir , hosted on Heroku . Google's Full Specification for Making AJAX Applications Crawlable suggests using HTMLUnit ... see How do I create an HTML snapshot? point #3. HtmlUnit is a Java-only headless browser emulator; and unfortunately jRuby is not an option on Heroku. So HtmlUnit is ruled out (to my knowledge). If you're interested I have another question open regarding HtmlUnit as a service hosted on Google App Engine... Making AJAX Applications Crawlable? How to build a simple web service on Google App Engine to produce HTML Snapshots? ...