googlebot

When does Googlebot execute javascript?

主宰稳场 提交于 2019-12-22 03:37:11
问题 I have a few single page web apps on multiple domains that heavily rely on javascript/ajax to fetch and show content. Based on logs and search results I can tell that googlebot runs javascript on some of the domains but not on others. On some it indexes everything thats only available with js on others it doesn't even seem to run js at all. Can anybody tell me how googlebot decides what js to run and if I can to anything to get it to run js on my other domains? PS: I know that normally I

When does Googlebot execute javascript?

99封情书 提交于 2019-12-22 03:37:08
问题 I have a few single page web apps on multiple domains that heavily rely on javascript/ajax to fetch and show content. Based on logs and search results I can tell that googlebot runs javascript on some of the domains but not on others. On some it indexes everything thats only available with js on others it doesn't even seem to run js at all. Can anybody tell me how googlebot decides what js to run and if I can to anything to get it to run js on my other domains? PS: I know that normally I

Can a 301 page be crawled by google?

烈酒焚心 提交于 2019-12-21 11:35:23
问题 Is it possible for google or any other crawler to crawl and index a page which returns a 301 status code? I have seen a page in google, which has had a 301 for months. However the cache date of that page in the index is from a few days ago. Can google just ignore the 301 and crawl the contents of a page? 回答1: Google always crawls the target of a redirect, HTTP 301 is not an exception. Could not find a better source than one employee's discussion post, though. Google Search Appliance

How to prevent Googlebot from overwhelming site?

时光怂恿深爱的人放手 提交于 2019-12-21 08:17:06
问题 I'm running a site with a lot of content, but little traffic, on a middle-of-the-road dedicated server. Occasionally, Googlebot will stampede us, resulting in Apache maxing out its memory, and causing the server to crash. How can I avoid this? 回答1: register at google webmaster tools, verify your site and throttle google bot down submit a sitemap read the google guildelines: (if-Modified-Since HTTP header) use robot.txt to restrict access from to bot to some parts of the website make a script

Angular2 App: Fetch as Google doesn't load page content

感情迁移 提交于 2019-12-20 12:34:24
问题 I am working on Angular2 based web app. I used Angular CLI to generate app and then to build it for prod. I have hosted website on AWS S3 & Cloudfront. When I use 'Fetch as Google' tool from the webmaster, it shows only Loading... . Isn't Googlebot able to crawl my website? 回答1: had a similar issue. I believe Google-Bot do not support modern JS. I simply activated all shims recommended by angular.io see https://angular.io/docs/ts/latest/guide/browser-support.html and added in the script

Googlebot receiving missing template error for an existing template

社会主义新天地 提交于 2019-12-20 09:36:36
问题 In the last couple of days, we have started to receive a missing template error when the google bot attempts to access our main home page (welcome/index). I have been staring at this for a couple of hours and know that I am just missing something simple. A ActionView::MissingTemplate occurred in welcome#index: Missing template welcome/index with {:handlers=>[:erb, :rjs, :builder, :rhtml, :rxml, :haml], :formats=>["*/*;q=0.9"], :locale=>[:en, :en]} But the template does exist (index.html.haml)

Can you deploy Watir on Heroku to generate HTML Snapshots? If so, how?

故事扮演 提交于 2019-12-18 12:42:53
问题 I would like to generate HTML Snapshots using Watir, hosted on Heroku. Google's Full Specification for Making AJAX Applications Crawlable suggests using HTMLUnit... see How do I create an HTML snapshot? point #3. HtmlUnit is a Java-only headless browser emulator; and unfortunately jRuby is not an option on Heroku. So HtmlUnit is ruled out (to my knowledge). If you're interested I have another question open regarding HtmlUnit as a service hosted on Google App Engine... Making AJAX Applications

Avoid crawling part of a page with “googleoff” and “googleon”

我们两清 提交于 2019-12-18 05:43:42
问题 I am trying to tell Google and other search engines not to crawl some parts of my web page. What I do is: <!--googleoff: all--> <select name="ddlCountry" id="ddlCountry"> <option value="All">All</option> <option value="bahrain">Bahrain</option> <option value="china">China</option> </select> <!--googleon: all--> After I uploaded the page, I noticed that search engines are stilling rendering elements within the googleoff markup. Am I doing something wrong? 回答1: "googleon" and "googleoff" are

Googlebot is crawling my site and entering ratings on my rating system

核能气质少年 提交于 2019-12-13 12:49:44
问题 My rating system allows anonymous users to add ratings, but Google's crawler is rating things. How can I ensure that Googlebot won't follow the link? 回答1: You shouldn't accept a GET request for any action that modifies data (voting, editing a post, etc.). Your voting should be done via a POST request, which Googlebot won't perform. More information in this SO post: When do you use POST and when do you use GET? 回答2: Use a robots.txt to point out links that bots shouldn't follow. For example,

Changing schema.org microdata with jQuery?

僤鯓⒐⒋嵵緔 提交于 2019-12-13 05:24:47
问题 Due to Googlebot's recent advances with interpreting JS, is it now possible to change schema.org microdata (ie. itemprop ) with jQuery? There's a similar question here on SO: Is it possible to change Microdata itemprop with jQuery? -- but it was from before Googlebot's recent advances mentioned above. 回答1: schema.org, RDF, Microdata, etc. are designed to provide context and information for machines . It provides them a means of accessing the information without having to execute client side