I want to scrape one website, to get the page content with this code:
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import D
<meta name="ROBOTS" content="value">
This meta tag conveys the different search engines about the actions they are allowed and not allowed to take on a certain page. This meta tag can be placed anywhere within the <head>
and </head>
tags.
Note:: As this <meta>
tag does not have a site-wide effect it can contain different values on different pages of the same website.
The valid values are:
Index
(default value)Noindex
None
Follow
Nofollow
Noarchive
Nosnippet
These values can be combined as well to form the desired valid meta robots tag.
Example:
<meta name="robots" content="noindex" />
<meta name="robots" content="index,follow" />
<meta name="robots" content="index,follow,noarchive" />
The NOINDEX
value conveys the search engines NOT to index the page, so the page should not show up in search results. The NOFOLLOW
value conveys the search engines NOT
to follow or discover the pages that are LINKED TO on this page.
Web developers adds the NOINDEX , NOFOLLOW meta robots tag on development websites, so the search engines accidentally doesn't start sending traffic to a website that is still under construction.
The reason can be either of the following:
What is the meaning of the meta name "robots" tag
Using the robots meta tag