问题
We are maintaining a website that uses the letters æ
, ø
, and å
in some of the page addresses. And this has worked just fine, except for some IE-issues early on, up until now. The problem we have gotten this last couple of weeks is that search engine crawlers, especially Bing, seem to be encoding the letters over and over.
So we get 404-errors as the crawler is trying to access the address /butikk/m%C3%83%C6%92%C3%86%E2%80%99%C3%83%E2%80%A0%C3%A2%E2%82%AC%E2%84%A2%C3%83%C6%92%C3%A2%E2%82%AC%C5%A1%C3%83%E2%80%9A%C3%82%C2%A3%C3%83%C6%92%C3%86%E2%80%99%C3%83%C2%A2%C3%A2%E2%80%9A%C2%AC%C3%82%C2%A0%C3%83%C6%92%C3%82%C2%A2%C3%83%C2%A2%C3%A2%E2%82%AC%C5%A1%C3%82%C2%AC%C3%83%C2%A2%C3%A2%E2%82%AC%C5%BE%C3%82%C2%A2%C3%83%C6%92%C3%86%E2%80%99%C3%83%E2%80%A0%C3%A2%E2%82%AC%E2%84%A2%C3%83%C6%92%C3%A2%E2%82%AC%C5%A1%C3%83%E2%80%9A%C3%82%C2%A2%C3%83%C6%92%C3%86%E2%80%99%C3%83%C2%A2%C3%A2%E2%80%9A%C2%AC%C3%85%C2%A1%C3%83%C6%92%C3%A2%E2%82%AC%C5%A1%C3%83%E2%80%9A%C3%82%C2%B8bler
, instead of /butikk/møbler
. Using /butikk/m%c3%b8bler
would also have gotten you to the right page. And as we are using Play Framework, we also get a site error as our controllers can be no longer than 250 characters, but that is not the real issue here.
Initially, there was no sitemap on the site. We added one, with UTF-8 encoded addresses, hoping this would lead the bots the right way, but so far nothing.
So has anybody had some similar issue and solved it, or have some suggestions in what we can do to make Bing Bot use the right addresses? Any help would be appreciated.
Added info: Having a look at Bing Webmaster Tools, I can see that Bing have both indexed the right address, and a version with "ø" instead of "ø". So my issue can hopefully be solved by removing the faulty address from the index.
回答1:
The best suggestion would be to leave out special characters out of your filenames/links/adresses. I've had a similar issue a few years back with links containing ä, ö, ü, which was resolved by simple removing the special characters and replacing them with standard UTF-8 characters.
来源:https://stackoverflow.com/questions/18953759/utf-8-encoding-in-page-addresses-issues-with-search-engine-crawlers