i have a classifieds website. On this website i store in the db, each product page that a user visits for history purposes, so he can view the last products he visited.
The problem is that when googlebot and others enter my site, the db fills up with thousands of entrys because it sores the thousand product pages Google visit.
I tried various functions with $_SERVER['HTTP_USER_AGENT']
to try to find out is the current user is googlebot or not and if it is, not sore the page views in the db so that it's not spammed with unusefull results but none of them seem to work, as i get the Google ip's and recognize them in my db.
Do any of you know a good way in php to ensure google stays out?
You can use the following snippit which should detect the GoogleBot and not store to the database.
if (!strpos($_SERVER['HTTP_USER_AGENT'],"Googlebot")) {
// log to database
}
Why in the world would you want to only keep google out? Other search-engines may index your site aswell. What about bing, yahoo, altavista and others?
You can make use of a robots.txt
to disallow any crawler to index your site.
Make a robots.txt in your root and put the following in it:
User-agent: *
Disallow: /
If you want to allow crawlers on some page tho, you can set the meta instead
<meta name="robots" content="noindex, nofollow" />
Not all bots are "nice" and respect these tags tho.
Did you think about all the other robots, spiders and automatic scripts surfing the web? They will also fill up your database. And it is hell to find out about all those UserAgents, IPs and other characteristics. Maybe it's better you just limit the history to lets say 25 entries.
So my answer is: limit the entries of your history db or store the history in a cookie in the visitors client.
<?php echo $_SERVER['REMOTE_ADDR'];?>
will give you the address of the client. Then you set a session variable that will store or discard the pages based on your logic checking the ip.
@Jan's answer is better way. Although that will cut off all robots.
来源:https://stackoverflow.com/questions/8243718/php-code-to-exclude-google