Will this robots.txt only allow googlebot to index my site?

三世轮回 提交于 2019-12-20 06:13:23

问题


Will this robots.txt file only allow googlebot to index my site's index.php file? CAVEAT, I have an htaccess redirect that people who type in

http://www.example.com/index.php

are redirected to simply

http://www.example.com/

So, this is my robots.txt file content...

User-agent: Googlebot
Allow: /index.php
Disallow: /

User-agent: *
Disallow: /

Thanks in advance!


回答1:


Not really.

Good bots
Only "good" bots follow the robots.txt instructions (not all robots and spiders bother to read/follow robots.txt). That might not even include all the main search engine's bots, but it definitely mean that some web crawlers will just completely ignore your requests (you should look at using .htaccess or password protection if you really want to stop bots/crawlers from seeing parts of your site).

Second checks
Google makes multiple visits to your website, including appearing as a browsing user. This second visit will ignore the robots.txt file. The second visit probably doesn't actually index (if that's your worry) but it does check to make sure you're not trying to fool the indexing bot (for SEO etc).

That being said your syntax is right... if that's all you're asking, then yes it'll work, just not as well as you might hope.




回答2:


Absent the redirect, Googlebot would not see your site, except for the index.php.

With the redirect, it depends on how the bot handles redirects and how your htaccess does the redirect. If you return a 302, then Googlebot will see http://www.example.com/, check against robots.txt, and not see the main site. Even if you do an internal redirect and tell Googlebot that the responding page is http://www.example.com/, it will see the page but might not index it.




回答3:


It's risky. To be sure that Google does index your homepage make this:

User-agent: *
Allow: /index.php
Disallow: /a
Disallow: /b
...
Disallow: /z
Disallow: /0
...
Disallow: /9

So your root "/" will not match disallow rules.

Also if you have AdSense don't forget to add

User-agent: Mediapartners-Google
Allow: /


来源:https://stackoverflow.com/questions/3805831/will-this-robots-txt-only-allow-googlebot-to-index-my-site

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!