php code to exclude google

扶醉桌前 提交于 2019-12-22 18:33:02

问题


i have a classifieds website. On this website i store in the db, each product page that a user visits for history purposes, so he can view the last products he visited.

The problem is that when googlebot and others enter my site, the db fills up with thousands of entrys because it sores the thousand product pages Google visit.

I tried various functions with $_SERVER['HTTP_USER_AGENT'] to try to find out is the current user is googlebot or not and if it is, not sore the page views in the db so that it's not spammed with unusefull results but none of them seem to work, as i get the Google ip's and recognize them in my db.

Do any of you know a good way in php to ensure google stays out?


回答1:


You can use the following snippit which should detect the GoogleBot and not store to the database.

if (!strpos($_SERVER['HTTP_USER_AGENT'],"Googlebot")) {
     // log to database
}



回答2:


Why in the world would you want to only keep google out? Other search-engines may index your site aswell. What about bing, yahoo, altavista and others?

You can make use of a robots.txt to disallow any crawler to index your site.

Make a robots.txt in your root and put the following in it:

User-agent: *
Disallow: /

If you want to allow crawlers on some page tho, you can set the meta instead

<meta name="robots" content="noindex, nofollow" />

Not all bots are "nice" and respect these tags tho.




回答3:


Did you think about all the other robots, spiders and automatic scripts surfing the web? They will also fill up your database. And it is hell to find out about all those UserAgents, IPs and other characteristics. Maybe it's better you just limit the history to lets say 25 entries.

So my answer is: limit the entries of your history db or store the history in a cookie in the visitors client.




回答4:


<?php echo $_SERVER['REMOTE_ADDR'];?> 

will give you the address of the client. Then you set a session variable that will store or discard the pages based on your logic checking the ip.

@Jan's answer is better way. Although that will cut off all robots.



来源:https://stackoverflow.com/questions/8243718/php-code-to-exclude-google

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!