Javascript string comparison fails when comparing unicode characters

时光怂恿深爱的人放手 提交于 2019-11-27 04:40:04

Unlike what some other people here have said, this has nothing to do with encodings. Rather, your two strings use different code points to render the same visual characters.

To solve this correctly, you need to perform Unicode normalization on the two strings before comparing them. Unforunately, JavaScript doesn't have this functionality built in. Here is a JavaScript library that can perform the normalization for you: https://github.com/walling/unorm

The JavaScript equality operator == will appear to be failing under the following circumstances. In all cases it is programmer error. Not a bug in JavaScript.

  1. The two strings do not contain the same number and sequence of characters.

  2. There is whitespace or newlines before, within or after one string. Use a trim() operator on both and look closely at both strings.

  3. Surprise typecasting. The programmer is comparing datatypes that are incompatible.

  4. There are unicode characters which look identical to other unicode characters but in fact are different unicode characters.

user2428118

UTF-8 is a complex thing. The charset has two different codes for characters such as á, é etc. As you already see in the URL encoded version, the HEX bytes of which the character is made differ for both versions.

See this answer for more information.

I had this same problem.

Adding

<meta charset="UTF-8">

to the HTML file fixed the issue.

In my case the templating engine was baking a json string into the HTML file. This string was in unicode.

While the template was also a unicode file, the JS engine was treating the string I wrote into the template as a latin-1 encoded string, until I added the meta tag.

I was comparing the typed in string to one of the JSON objects items (location.title == "Mühle")

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!