What characters are grouped with Array.from?

后端 未结 3 529
Happy的楠姐
Happy的楠姐 2021-02-03 19:06

I\'ve been playing around with JS and can\'t figure out how JS decides which elements to add to the created array when using Array.from(). For example, the followin

3条回答
  •  野性不改
    2021-02-03 19:41

    Array.from first tries to invoke the iterator of the argument if it has one, and strings do have iterators, so it invokes String.prototype[Symbol.iterator], so let's look up how the prototype method works. It's described in the specification here:

    1. Let O be ? RequireObjectCoercible(this value).
    2. Let S be ? ToString(O).
    3. Return CreateStringIterator(S).

    Looking up CreateStringIterator eventually takes you to 21.1.5.2.1 %StringIteratorPrototype%.next ( ), which does:

    1. Let cp be ! CodePointAt(s, position).
    2. Let resultString be the String value containing cp.[[CodeUnitCount]] consecutive code units from s beginning with the code unit at index position.
    3. Set O.[[StringNextIndex]] to position + cp.[[CodeUnitCount]].
    4. Return CreateIterResultObject(resultString, false).

    The CodeUnitCount is what you're interested in. This number comes from CodePointAt :

    1. Let first be the code unit at index position within string.
    2. Let cp be the code point whose numeric value is that of first.
    3. If first is not a leading surrogate or trailing surrogate, then

      a. Return the Record { [[CodePoint]]: cp, [[CodeUnitCount]]: 1, [[IsUnpairedSurrogate]]: false }.

    4. If first is a trailing surrogate or position + 1 = size, then

      a.Return the Record { [[CodePoint]]: cp, [[CodeUnitCount]]: 1, [[IsUnpairedSurrogate]]: true }.

    5. Let second be the code unit at index position + 1 within string.

    6. If second is not a trailing surrogate, then

      a. Return the Record { [[CodePoint]]: cp, [[CodeUnitCount]]: 1, [[IsUnpairedSurrogate]]: true }.

    7. Set cp to ! UTF16DecodeSurrogatePair(first, second).

    8. Return the Record { [[CodePoint]]: cp, [[CodeUnitCount]]: 2, [[IsUnpairedSurrogate]]: false }.

    So, when iterating over a string with Array.from, it returns a CodeUnitCount of 2 only when the character in question is the start of a surrogate pair. Characters that are interpreted as surrogate pairs are described here:

    Such operations apply special treatment to every code unit with a numeric value in the inclusive range 0xD800 to 0xDBFF (defined by the Unicode Standard as a leading surrogate, or more formally as a high-surrogate code unit) and every code unit with a numeric value in the inclusive range 0xDC00 to 0xDFFF (defined as a trailing surrogate, or more formally as a low-surrogate code unit) using the following rules..:

    षि is not a surrogate pair:

    console.log('षि'.charCodeAt()); // First character code: 2359, or 0x937
    console.log('षि'.charCodeAt(1)); // Second character code: 2367, or 0x93F

    But

提交回复
热议问题