robots.txt

block search engine crawling directory

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-20 06:36:22
问题 My site have URL (longest) structure like this http://www.example.com/xyz-pqr/abcd-efgh/123.html So there is maximum of 3 directory level, but because of CMS and other problem my site is getting indexed in search engine for more than 3 directory level url like, http://www.example.com/xyz-pqr/abcd-efgh/xyz-pqr/abcd-efgh/123.html http://www.example.com/xyz-pqr/abcd-efgh/xyz-pqr/abcd-efgh/abcd-efgh/123.html I want to write code in robots.txt so that, search engine will never crawl more than 3

Will this robots.txt only allow googlebot to index my site?

末鹿安然 提交于 2019-12-20 06:13:55
问题 Will this robots.txt file only allow googlebot to index my site's index.php file? CAVEAT, I have an htaccess redirect that people who type in http://www.example.com/index.php are redirected to simply http://www.example.com/ So, this is my robots.txt file content... User-agent: Googlebot Allow: /index.php Disallow: / User-agent: * Disallow: / Thanks in advance! 回答1: Not really. Good bots Only "good" bots follow the robots.txt instructions (not all robots and spiders bother to read/follow

Will this robots.txt only allow googlebot to index my site?

三世轮回 提交于 2019-12-20 06:13:23
问题 Will this robots.txt file only allow googlebot to index my site's index.php file? CAVEAT, I have an htaccess redirect that people who type in http://www.example.com/index.php are redirected to simply http://www.example.com/ So, this is my robots.txt file content... User-agent: Googlebot Allow: /index.php Disallow: / User-agent: * Disallow: / Thanks in advance! 回答1: Not really. Good bots Only "good" bots follow the robots.txt instructions (not all robots and spiders bother to read/follow

May Disallow entire website on robots.txt have consequences after removal?

本秂侑毒 提交于 2019-12-20 02:56:34
问题 I've published a website and, due to a misunderstanding not depending on me, I had to block all the pages before indexing. Some of these pages had been already linked on social networks, so to avoid a bad user-experience I've decided to insert the following code into "robots.txt" User-agent: * Disallow: * I've received a "critical problem" alert on webmaster tools and I'm a bit worried about it. In your experience, would it be sufficient (whenever possible) to restore the original "robots.txt

Robots.txt deny, for a #! URL

☆樱花仙子☆ 提交于 2019-12-18 09:19:24
问题 I am trying to add a deny rule to a robots.txt file, to deny access to a single page. The website URLs work as follows: http://example.com/#!/homepage http://example.com/#!/about-us http://example.com/#!/super-secret Javascript then swaps out the DIV that is displayed, based on the URL. How would I request a search engine spider not list the following: http://example.com/#!/super-secret http://example.com/index.php#!/super-secret Thanks in advance 回答1: You can actually do this multiple ways,

How to add `nofollow, noindex` all pages in robots.txt?

泪湿孤枕 提交于 2019-12-18 08:32:05
问题 I want to add nofollow and noindex to my site whilst it's being built. The client has request I use these rules. I am aware of <meta name="robots" content="noindex,nofollow"> But I only have access to the robots.txt file. Does anyone know the correct format I can use to apply noindex, nofollow rules via the robots.txt file? 回答1: noindex and nofollow means you do not want your site to crawl in search engine. so simply put code in robots.txt User-agent: * Disallow: / it means noindex and

How to stop Google indexing my Github repository

喜夏-厌秋 提交于 2019-12-17 17:25:35
问题 I use Github to store the text of one of my web sites, but the problem is Google indexing the text in Github as well. So the same text will show up both on my site and on Github. e.g. this search The top hit is my site. The second hit is the Github repository. I don't mind if people see the sources but I don't want Google to index it (and maybe penalize for duplicate content.) Is there any way, besides taking the repository private, to tell Google to stop indexing it? What happens in the case

Wildcards in robots.txt

Deadly 提交于 2019-12-17 17:13:14
问题 If in WordPress website I have categories in this order: -Parent --Child ---Subchild I have permalinks set to: %category%/%postname% Let use an example. I create post with post name "Sport game". It's tag is sport-game. It's full url is: domain.com/parent/child/subchild/sport-game Why I use this kind of permalinks is exactly to block some content easier in robots.txt. And now this is the part I have question for. In robots.txt: User-agent: Googlebot Disallow: /parent/* Disallow: /parent/*/*

Ignore urls in robot.txt with specific parameters?

我的未来我决定 提交于 2019-12-17 10:31:56
问题 I would like for google to ignore urls like this: http://www.mydomain.com/new-printers?dir=asc&order=price&p=3 All urls that have the parameters dir, order and price should be ignored but I dont have experience with Robots.txt. Any idea? 回答1: Here's a solutions if you want to disallow query strings: Disallow: /*?* or if you want to be more precise on your query string: Disallow: /*?dir=*&order=*&p=* You can also add to the robots.txt which url to allow Allow: /new-printer$ The $ will make

How to disallow service api and multilingual urls in robots.txt

随声附和 提交于 2019-12-13 16:42:25
问题 I need to disallow the next URLs: service api /_s/user , /_s/place , ... All starts with /_s/ save form: /{language}/save . For example /{en}/save , /{ru}/save , ... NOTE: most URLs have language parameter at the beginning: /en/event , ... I don't want to block them. Should be something like: (but this is not allowed by robots.txt format) Disallow: /_s/* Disallow: /:lang/save 回答1: In robots.txt matching is from the left, so it matches anything that begins with /pattern . The wildcard like /