问题
I already have a 404 handler in the SPA which works. The problem here is that Google for example links to old pages that no longer exist. While the user will see a custom 404 component, google will get, I assume, a 200 OK and continue to think the page is valid.
{
path: '*',
name: 'not-found',
component: NotFound // 404
}
I have the server re-route to / and let vue handle the routing using History:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTPS} off
RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
RewriteBase /
RewriteRule ^index\.html$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.html [L]
</IfModule>
It's a standard Vue CLI install with a php backend. PHP is currently only used for API calls.
Is there a way to have the server return a 404 status code in this scenario?
Suggested solution? The server knows nothing about the routing happening in the frontend, but I could have webpack output a sitemap or something like that which can be verified by the server, set 404 in the header and let it load the SPA that show the 404. Would this be OK or is there a better solution?
Note I ended up automatically creating a sitemap and then checking the routes against the sitemap. If the route didn't match it was rerouted to a custom 404. This worked reasonably well, but Google was still a bit confused.
回答1:
I have performed some research on how SPA can mimic or respond to search-bots-requests, so here we go - three working solutions.
Supporting links:
- Updating Page Title & Metadata with Vue.js & vue-router
Meta tag #1
Description:
HTTP code 404 means that there is no resource or it was removed permanently. Removed resource means that we want to tell GoogleBot to remove the "dead" link from search index. Great! Now we have another question which can be answered - <meta name=”robots” content=”noindex”>
As Google docs state:
You can prevent a page from appearing in Google Search by including a noindex meta tag in the page's HTML code, or by returning a 'noindex' header in the HTTP request. When Googlebot next crawls that page and see the tag or header, Googlebot will drop that page entirely from Google Search results, regardless of whether other sites link to it.
Supporting links:
- https://searchengineland.com/meta-robots-tag-101-blocking-spiders-cached-pages-more-10665
- https://support.google.com/webmasters/answer/79812?hl=en
- https://support.google.com/webmasters/answer/93710?visit_id=636835318879056986-3786307088&rd=1
Meta tag #2
Description:
If we cannot (or do not want to) use our server to respond with 404 or any other code we can try to perform some sort of redirect - seo-safe
redirect (if there is no JS enabled).
This redirect uses HTML meta
-tag, an example (redirects to example.com immediately):
<meta http-equiv="refresh" content="0; url=http://example.com/">
Quote from StackOverflow answer:
As a reminder, and although it is not the preferred way to perform a redirect, Google accepts and follows pages having a Refresh tag with its delay set to 0, because, in some tricky cases, there is simply no other way to perform a redirect. This is the recommended method for Blogger pages (owned by Google).
HTTP code 301 will eventually be converted
to 404 if you will permanently redirect to a file which does not exist. From Google Docs (Prepare for 301 redirects):
While Googlebot and browsers can follow a "chain" of multiple redirects (e.g., Page 1 > Page 2 > Page 3), we advise redirecting to the final destination. If this is not possible, keep the number of redirects in the chain low, ideally no more than 3 and fewer than 5. Chaining redirects adds latency for users, and not all browsers support long redirect chains.
Supporting links:
- https://en.wikipedia.org/wiki/Meta_refresh
- SEO consequences of redirecting with META REFRESH
- http://sebastians-pamphlets.com/google-and-yahoo-treat-undelayed-meta-refresh-as-301-redirect/
- https://developer.mozilla.org/en-US/docs/Web/HTTP/Redirections#Permanent_redirections
JavaScript Redirect
Description:
Perform an onload
-redirect with window.location = '/404.html'
to invalid location (a file that does not exist) + integrate Google Not Found Widget.
Supporting links:
- https://googleblog.blogspot.com/2008/10/helping-website-oweners-fix-broken.html
来源:https://stackoverflow.com/questions/54218371/how-to-get-a-404-response-in-vue-router