How to return proper 404 for google while providing user friendly content to the user?

梦想的初衷 提交于 2019-12-11 13:33:19

问题


I am bouncing between posting this here and on Superuser. Please excuse me if you feel this does not belong here.

I am observing the behavior described here - Googlebot is requesting random urls on my site, like aecgeqfx.html or sutwjemebk.html. I am sure that I am not linking these urls from anywhere on my site.

I suspect this may be google probing how we handle non existent content - to cite from an answer to the linked question:

 [google is requesting random urls to] see if your site correctly 
 handles non-existent files (by returning a 404 response header)

We have a custom page for nonexistent content - a styled page saying "Content not found, if you believe you got here by error, please contact us", with a few internal links, served (naturally) with a 200 OK. The URL is served directly (no redirection to a single url).

I am afraid this may discriminate the site at google - they may not interpret the user friendly page as a 404 - not found and may think we are trying to fake something and provide duplicate content.

How should I proceed to ensure that google will not think the site is bogus while providing user friendly message to users in case they click on dead links by accident?


回答1:


The best practice would be to return the user friendly 404 page with a 404 response code, not a 200. Your web server should handle this for you relatively easily.




回答2:


Use errordocument in apache

ErrorDocument 500 http://foo.example.com/cgi-bin/tester
ErrorDocument 404 /cgi-bin/bad_urls.pl
ErrorDocument 401 /subscription_info.html
ErrorDocument 403 "Sorry can't allow you access today"

The error document can be whatever you would like. Ex if you are using PHP you can create a file called error404.php like this:

<?php
header("HTTP/1.0 404 Not Found");

echo 'Hi, this page does not exist...<img src="nice-logo.png" alt="logo" />'


?>

The only thing that is important is that the response must include a correct 404 code in the header - outputted by Apache, PHP or any other dynamic script.

Example of funny 404 : http://www.northernbrewer.com/brewing/weekly_fermenterd




回答3:


You can still send a 404 status and provide user-friendly messages for dead links in the same response. Even "normal users" should get the 404 status even if the page doesn't look like your typical failure page. How you intercept the request depends on your webserver. That's going to be a lot easier than detecting the user-agent and doing something different for Googlebot.



来源:https://stackoverflow.com/questions/2547430/how-to-return-proper-404-for-google-while-providing-user-friendly-content-to-the

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!