Expressing UTF-16 unicode characters in JavaScript

前端 未结 2 455
死守一世寂寞
死守一世寂寞 2020-12-09 11:02

To express, for example, the character U+10400 in JavaScript, I use \"\\uD801\\uDC00\" or String.fromCharCode(0xD801) + String.fromCharCode(0xDC00)

相关标签:
2条回答
  • 2020-12-09 11:31

    Based on the wikipedia article given by Henning Makholm, the following function will return the correct character for a code point:

    function getUnicodeCharacter(cp) {
    
        if (cp >= 0 && cp <= 0xD7FF || cp >= 0xE000 && cp <= 0xFFFF) {
            return String.fromCharCode(cp);
        } else if (cp >= 0x10000 && cp <= 0x10FFFF) {
    
            // we substract 0x10000 from cp to get a 20-bits number
            // in the range 0..0xFFFF
            cp -= 0x10000;
    
            // we add 0xD800 to the number formed by the first 10 bits
            // to give the first byte
            var first = ((0xffc00 & cp) >> 10) + 0xD800
    
            // we add 0xDC00 to the number formed by the low 10 bits
            // to give the second byte
            var second = (0x3ff & cp) + 0xDC00;
    
            return String.fromCharCode(first) + String.fromCharCode(second);
        }
    }
    
    0 讨论(0)
  • 2020-12-09 11:38

    How do I find 0xD801 and 0xDC00 from 0x10400?

    JavaScript uses UCS-2 internally. That’s why String#charCodeAt() doesn’t work the way you’d want it to.

    If you want to get the code point of every Unicode character (including non-BMP characters) in a string, you could use Punycode.js’s utility functions to convert between UCS-2 strings and UTF-16 code points:

    // String#charCodeAt() replacement that only considers full Unicode characters
    punycode.ucs2.decode('                                                                    
    0 讨论(0)
提交回复
热议问题