Disallow pdf files from indexing (Robots.txt)

丶灬走出姿态 提交于 2019-12-12 02:52:27

问题


I have links being indexed that shouldn't. I need to remove them from google. What should I enter to robots.txt Link example http://sitename.com/wp-content/uploads/2014/02/The-Complete-Program-2014.pdf


回答1:


With robots.txt, you can disallow crawling, not indexing.

With this robots.txt

User-agent: *
Disallow: /wp-content/uploads/2014/02/The-Complete-Program-2014.pdf

any URL whose path starts with /wp-content/uploads/2014/02/The-Complete-Program-2014.pdf is not allowed to be crawled.

But if a bot finds this URL in some other way (e.g., linked by someone else), they might still index it (without ever crawling/visiting it). The same goes for search engines that already indexed it: they might keep it (but will no longer visit it).

To disallow indexing, you could use the HTTP header X-Robots-Tag with the noindex parameter. In that case, you should not block crawling of the file in robots.txt, otherwise bots would never be able to see your headers (and so they would never know that you don’t want this file to get indexed).



来源:https://stackoverflow.com/questions/32129121/disallow-pdf-files-from-indexing-robots-txt

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!