googlebot | 易学教程

When does Googlebot execute javascript?

阅读更多关于 When does Googlebot execute javascript?

问题 I have a few single page web apps on multiple domains that heavily rely on javascript/ajax to fetch and show content. Based on logs and search results I can tell that googlebot runs javascript on some of the domains but not on others. On some it indexes everything thats only available with js on others it doesn't even seem to run js at all. Can anybody tell me how googlebot decides what js to run and if I can to anything to get it to run js on my other domains? PS: I know that normally I

When does Googlebot execute javascript?

阅读更多关于 When does Googlebot execute javascript?

Can a 301 page be crawled by google?

阅读更多关于 Can a 301 page be crawled by google?

问题 Is it possible for google or any other crawler to crawl and index a page which returns a 301 status code? I have seen a page in google, which has had a 301 for months. However the cache date of that page in the index is from a few days ago. Can google just ignore the 301 and crawl the contents of a page? 回答1: Google always crawls the target of a redirect, HTTP 301 is not an exception. Could not find a better source than one employee's discussion post, though. Google Search Appliance

How to prevent Googlebot from overwhelming site?

阅读更多关于 How to prevent Googlebot from overwhelming site?

问题 I'm running a site with a lot of content, but little traffic, on a middle-of-the-road dedicated server. Occasionally, Googlebot will stampede us, resulting in Apache maxing out its memory, and causing the server to crash. How can I avoid this? 回答1: register at google webmaster tools, verify your site and throttle google bot down submit a sitemap read the google guildelines: (if-Modified-Since HTTP header) use robot.txt to restrict access from to bot to some parts of the website make a script

Angular2 App: Fetch as Google doesn't load page content

阅读更多关于 Angular2 App: Fetch as Google doesn't load page content

问题 I am working on Angular2 based web app. I used Angular CLI to generate app and then to build it for prod. I have hosted website on AWS S3 & Cloudfront. When I use 'Fetch as Google' tool from the webmaster, it shows only Loading... . Isn't Googlebot able to crawl my website? 回答1: had a similar issue. I believe Google-Bot do not support modern JS. I simply activated all shims recommended by angular.io see https://angular.io/docs/ts/latest/guide/browser-support.html and added in the script

Googlebot receiving missing template error for an existing template

阅读更多关于 Googlebot receiving missing template error for an existing template

问题 In the last couple of days, we have started to receive a missing template error when the google bot attempts to access our main home page (welcome/index). I have been staring at this for a couple of hours and know that I am just missing something simple. A ActionView::MissingTemplate occurred in welcome#index: Missing template welcome/index with {:handlers=>[:erb, :rjs, :builder, :rhtml, :rxml, :haml], :formats=>["*/*;q=0.9"], :locale=>[:en, :en]} But the template does exist (index.html.haml)

Can you deploy Watir on Heroku to generate HTML Snapshots? If so, how?

阅读更多关于 Can you deploy Watir on Heroku to generate HTML Snapshots? If so, how?

问题 I would like to generate HTML Snapshots using Watir, hosted on Heroku. Google's Full Specification for Making AJAX Applications Crawlable suggests using HTMLUnit... see How do I create an HTML snapshot? point #3. HtmlUnit is a Java-only headless browser emulator; and unfortunately jRuby is not an option on Heroku. So HtmlUnit is ruled out (to my knowledge). If you're interested I have another question open regarding HtmlUnit as a service hosted on Google App Engine... Making AJAX Applications

Avoid crawling part of a page with “googleoff” and “googleon”

阅读更多关于 Avoid crawling part of a page with “googleoff” and “googleon”

问题 I am trying to tell Google and other search engines not to crawl some parts of my web page. What I do is:  <select name="ddlCountry" id="ddlCountry"> <option value="All">All</option> <option value="bahrain">Bahrain</option> <option value="china">China</option> </select>  After I uploaded the page, I noticed that search engines are stilling rendering elements within the googleoff markup. Am I doing something wrong? 回答1: "googleon" and "googleoff" are

Googlebot is crawling my site and entering ratings on my rating system

阅读更多关于 Googlebot is crawling my site and entering ratings on my rating system

问题 My rating system allows anonymous users to add ratings, but Google's crawler is rating things. How can I ensure that Googlebot won't follow the link? 回答1: You shouldn't accept a GET request for any action that modifies data (voting, editing a post, etc.). Your voting should be done via a POST request, which Googlebot won't perform. More information in this SO post: When do you use POST and when do you use GET? 回答2: Use a robots.txt to point out links that bots shouldn't follow. For example,

Changing schema.org microdata with jQuery?

阅读更多关于 Changing schema.org microdata with jQuery?

问题 Due to Googlebot's recent advances with interpreting JS, is it now possible to change schema.org microdata (ie. itemprop ) with jQuery? There's a similar question here on SO: Is it possible to change Microdata itemprop with jQuery? -- but it was from before Googlebot's recent advances mentioned above. 回答1: schema.org, RDF, Microdata, etc. are designed to provide context and information for machines . It provides them a means of accessing the information without having to execute client side