问题
I have two strings:
a = 'hà nội'
b = 'hà nội'
When I compare them with a == b
, it returns false
.
I checked the byte codes:
a.bytes = [104, 97, 204, 128, 32, 110, 195, 180, 204, 163, 105]
b.bytes = [104, 195, 160, 32, 110, 225, 187, 153, 105]
What is the cause? How can I fix it so that a == b
returns true
?
回答1:
This is an issue with Unicode equivalence.
In order to compare these strings you need to normalize them, so that they both use the same byte sequences for these types of characters.
a.unicode_normalize == b.unicode_normalize
unicode_normalize(form=:nfc)
[link]
Returns a normalized form of str, using Unicode normalizations NFC, NFD, NFKC, or NFKD. The normalization form used is determined by form, which is any of the four values :nfc, :nfd, :nfkc, or :nfkd. The default is :nfc.
If the string is not in a Unicode Encoding, then an Exception is raised. In this context, 'Unicode Encoding' means any of UTF-8, UTF-16BE/LE, and UTF-32BE/LE, as well as GB18030, UCS_2BE, and UCS_4BE. Anything else than UTF-8 is implemented by converting to UTF-8, which makes it slower than UTF-8.
来源:https://stackoverflow.com/questions/48472375/same-string-but-different-bytes-codes