JavaScript strings - UTF-16 vs UCS-2?

前端 未结 3 1827
一向
一向 2020-12-01 05:53

I\'ve read in some places that JavaScript strings are UTF-16, and in other places they\'re UCS-2. I did some searching around to try to figure out the difference and found t

相关标签:
3条回答
  • 2020-12-01 06:00

    Its just a 16-bit value with no encoding specified in the ECMAScript standard.

    See section 7.8.4 String Literals in this document: http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf

    0 讨论(0)
  • 2020-12-01 06:06

    JavaScript, strictly speaking, ECMAScript, pre-dates Unicode 2.0, so in some cases you may find references to UCS-2 simply because that was correct at the time the reference was written. Can you point us to specific citations of JavaScript being "UCS-2"?

    Specifications for ECMAScript versions 3 and 5 at least both explicitly declare a String to be a collection unsigned 16-bit integers and that if those integer values are meant to represent textual data, then they are UTF-16 code units. See section 8.4 of the ECMAScript Language Specification.


    EDIT: I'm no longer sure my answer is entirely correct. See the excellent article mentioned above, http://mathiasbynens.be/notes/javascript-encoding, which in essence says that while a JavaScript engine may use UTF-16 internally, and most do, the language itself effectively exposes those characters as if they were UCS-2.

    0 讨论(0)
  • 2020-12-01 06:12

    It's UTF-16/USC-2. It can handle surrogate pairs, but the charAt/charCodeAt returns a 16-bit char and not the Unicode codepoint. If you want to have it handle surrogate pairs, I suggest a quick read through this.

    0 讨论(0)
提交回复
热议问题