googlebot | 易学教程

HTTP status code for overloaded server

阅读更多关于 HTTP status code for overloaded server

问题 Some hours my web site's server has too much load. Which HTTP status code should I send to the Googlebot that visits my website? Is " 269 Call Back Later " this suitable for this case, or 503 Service Unavailable or do you have any more suggestions? 回答1: 503 means the service is temporarily unavailable so it is appropriate to use while the server is overloaded. http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html The Wikipedia article defines 269 as the initial response for a request that

How to return proper 404 for google while providing user friendly content to the user?

阅读更多关于 How to return proper 404 for google while providing user friendly content to the user?

问题 I am bouncing between posting this here and on Superuser. Please excuse me if you feel this does not belong here. I am observing the behavior described here - Googlebot is requesting random urls on my site, like aecgeqfx.html or sutwjemebk.html . I am sure that I am not linking these urls from anywhere on my site. I suspect this may be google probing how we handle non existent content - to cite from an answer to the linked question: [google is requesting random urls to] see if your site

Block some of dynamic pages from search engines

阅读更多关于 Block some of dynamic pages from search engines

问题 I need to block some of my pages from search engines.How can I do that ? App has been developed by using ASP.net MVC and AngularJS. Thanks in advance. These are the urls which I want to block from the search engines. http://localhost:12534/myurl123-event?participant=12957 http://localhost:12534/myurl123-event Note : The last part of the url is dynamic (i.e. myurl123-event?participant=12957 and myurl123-event ). 回答1: You can use a robots.txt with a disallow setting: User-agent: * Disallow:

Googlebot Crawl Error 500 and PHP Error reporting (with a strange solution)

阅读更多关于 Googlebot Crawl Error 500 and PHP Error reporting (with a strange solution)

问题 So Google wouldn't crawl anywhere on my live site other than some simple first pages - instead just giving me 500 errors. Fetching as Google in the Webmaster tools showed that it would return the full html output with the header "HTTP/1.0 500 Internal Server Error". I work locally in xampp with display_errors turned on but couldn't see any problems there so I checked the error_log on the live server - nothing there either. Eventually I decided to switch on display_errors on the server - I don

Is localization using Cookies search engine compatible?

阅读更多关于 Is localization using Cookies search engine compatible?

问题 I'm in the process of localizing a website. I was going to go the way of setting a cookie to the preferred language, and then display the respective language. And, if no cookie was set it would use the preferred language header, as set by the user's browser - and if the header was not set then it would default to English. So - how does Google's bot work? Will it crawl all websites once each with a different language set in the headers so that it can get each version of the website, or does it

prevent googlebot from indexing file types in robots.txt and .htaccess

阅读更多关于 prevent googlebot from indexing file types in robots.txt and .htaccess

问题 There are many Stack Overflow questions on how to prevent google bot from indexing, for instance, txt files. There's this: robots.txt User-agent: Googlebot Disallow: /*.txt$ .htaccess <Files ~ "\.txt$"> Header set X-Robots-Tag "noindex, nofollow" </Files> However, what is the syntax for both of these when trying to prevent two types of files from being indexed? In my case - txt and doc . 回答1: In your robots.txt file: User-agent: Googlebot Disallow: /*.txt$ Disallow: /*.doc$ More details at

AngularJS / AJAX app and search engine crawlers

阅读更多关于 AngularJS / AJAX app and search engine crawlers

问题 I've got a web app which heavily uses AngularJS / AJAX and I'd like it to be crawlable by Google and other search engines. My understanding is that I need to do something special to make it work, as described here: https://developers.google.com/webmasters/ajax-crawling Unfortunately, that looks quite nasty and I'd rather not introduce the hash tags. What I'd like to do is to serve a static page to Googlebot (based on the User-Agent), either directly or by sending it a 302 redirect. That way,

php code to exclude google

阅读更多关于 php code to exclude google

i have a classifieds website. On this website i store in the db, each product page that a user visits for history purposes, so he can view the last products he visited. The problem is that when googlebot and others enter my site, the db fills up with thousands of entrys because it sores the thousand product pages Google visit. I tried various functions with $_SERVER['HTTP_USER_AGENT'] to try to find out is the current user is googlebot or not and if it is, not sore the page views in the db so that it's not spammed with unusefull results but none of them seem to work, as i get the Google ip's

Googlebots Ignoring robots.txt? [closed]

阅读更多关于 Googlebots Ignoring robots.txt? [closed]

Closed. This question is off-topic . It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 7 years ago . I have a site with the following robots.txt in the root: User-agent: * Disabled: / User-agent: Googlebot Disabled: / User-agent: Googlebot-Image Disallow: / And pages within this site are getting scanned by Googlebots all day long. Is there something wrong with my file or with Google? It should be Disallow: , not Disabled: . Maybe give the Google robots.txt checker a try Google have an analysis tool for checking

Can a 301 page be crawled by google?

阅读更多关于 Can a 301 page be crawled by google?

Is it possible for google or any other crawler to crawl and index a page which returns a 301 status code? I have seen a page in google, which has had a 301 for months. However the cache date of that page in the index is from a few days ago. Can google just ignore the 301 and crawl the contents of a page? Google always crawls the target of a redirect, HTTP 301 is not an exception. Could not find a better source than one employee's discussion post , though. Google Search Appliance documentation says the same and I don't see why GSA and GoogleBot should handle redirects differently. Normally