May Disallow entire website on robots.txt have consequences after removal?

本秂侑毒 提交于 2019-12-20 02:56:34

问题


I've published a website and, due to a misunderstanding not depending on me, I had to block all the pages before indexing. Some of these pages had been already linked on social networks, so to avoid a bad user-experience I've decided to insert the following code into "robots.txt"

User-agent: *
Disallow: *

I've received a "critical problem" alert on webmaster tools and I'm a bit worried about it. In your experience, would it be sufficient (whenever possible) to restore the original "robots.txt"? May the current situation leave consequences (penalizations or similar) on the website if it lasts for long time (and if it does, how can I fix it)?. I'm sorry if the question may sound a bit generic, but I'm not able to find specific answers. Thanks in advance.


回答1:


The "critical problem" occurs because Google cannot index pages on your site with your robots.txt configuration. If you're still developing the site, it is standard procedure to have this robots.txt configuration. Webmaster tools treats your site as if it was in production however it sounds like you are still developing so this is something of a false-positive error message in this case.

Having this robots.txt configuration has no long-term negative effects for search engine ranking, however the longer that search engines are able to access your site the better the ranking will be. For Google it's something like 3 months of stable crawling will earn it some kind of trusted status. So it really depends on the domain and whether or not it has been previously indexed by Google and for how long, but there would still be no long-term consequences at the very most you will have to wait another 3 months to "earn Google's trust" again.

Most social networks will read the robots.txt file as and when the user shares, search engines on the other hand vary in their indexing rate and will take anything from a few hours to a couple of weeks to detect changes in your robots.txt file and update the index.

Hope this helps, if you can provide more details about your circumstances I may be able to help further, but this should at least answer your question.




回答2:


My goal (for the moment) is to block all bots

Your current robots.txt does not block all bots.

In the original robots.txt specification, Disallow: * means: disallow crawling all URLs that start with *, for example:

  • http://example.com/*
  • http://example.com/****
  • http://example.com/*p
  • http://example.com/*.html

Some parsers don’t follow the original specification and interpret * as wildcard character. For them (and only for them) it would probably mean to block all URLs (where * means: "any character(s)").

In few words, I would like the website to be accessed only from humans and not from bots.

Then you should use:

User-agent: *
Disallow: /


来源:https://stackoverflow.com/questions/23033969/may-disallow-entire-website-on-robots-txt-have-consequences-after-removal

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!