What characters are grouped with Array.from?

后端未结

关注

 3  529

Happy的楠姐 2021-02-03 19:06

I\'ve been playing around with JS and can\'t figure out how JS decides which elements to add to the created array when using Array.from(). For example, the followin

3条回答

野性不改 (楼主)

2021-02-03 19:41
Array.from first tries to invoke the iterator of the argument if it has one, and strings do have iterators, so it invokes String.prototype[Symbol.iterator], so let's look up how the prototype method works. It's described in the specification here:
1. Let O be ? RequireObjectCoercible(this value).
2. Let S be ? ToString(O).
3. Return CreateStringIterator(S).
Looking up CreateStringIterator eventually takes you to 21.1.5.2.1 %StringIteratorPrototype%.next ( ), which does:
1. Let cp be ! CodePointAt(s, position).
2. Let resultString be the String value containing cp.[[CodeUnitCount]] consecutive code units from s beginning with the code unit at index position.
3. Set O.[[StringNextIndex]] to position + cp.[[CodeUnitCount]].
4. Return CreateIterResultObject(resultString, false).
The CodeUnitCount is what you're interested in. This number comes from CodePointAt :
1. Let first be the code unit at index position within string.
2. Let cp be the code point whose numeric value is that of first.
3. If first is not a leading surrogate or trailing surrogate, then
  
  a. Return the Record { [[CodePoint]]: cp, [[CodeUnitCount]]: 1, [[IsUnpairedSurrogate]]: false }.
4. If first is a trailing surrogate or position + 1 = size, then
  
  a.Return the Record { [[CodePoint]]: cp, [[CodeUnitCount]]: 1, [[IsUnpairedSurrogate]]: true }.
5. Let second be the code unit at index position + 1 within string.
6. If second is not a trailing surrogate, then
  
  a. Return the Record { [[CodePoint]]: cp, [[CodeUnitCount]]: 1, [[IsUnpairedSurrogate]]: true }.
7. Set cp to ! UTF16DecodeSurrogatePair(first, second).
8. Return the Record { [[CodePoint]]: cp, [[CodeUnitCount]]: 2, [[IsUnpairedSurrogate]]: false }.
So, when iterating over a string with Array.from, it returns a CodeUnitCount of 2 only when the character in question is the start of a surrogate pair. Characters that are interpreted as surrogate pairs are described here:

Such operations apply special treatment to every code unit with a numeric value in the inclusive range 0xD800 to 0xDBFF (defined by the Unicode Standard as a leading surrogate, or more formally as a high-surrogate code unit) and every code unit with a numeric value in the inclusive range 0xDC00 to 0xDFFF (defined as a trailing surrogate, or more formally as a low-surrogate code unit) using the following rules..:

षि is not a surrogate pair:
```
console.log('षि'.charCodeAt()); // First character code: 2359, or 0x937
console.log('षि'.charCodeAt(1)); // Second character code: 2367, or 0x93F
```
But
0 讨论(0) 查看其它3个回答发布评论: 提交评论加载中...