I\'m using this code to generate U+10FFFC
var s = Encoding.UTF8.GetString(new byte[] {0xF4,0x8F,0xBF,0xBC});
I know it\'s for
While @R. Martinho Fernandes's answer is correct, his AsCodePoints
extension method has two issues:
ArgumentException
on invalid code points (high surrogate without low surrogate or vice versa).char
static methods that take (char)
or (string, int)
(such as char.IsNumber()
) if you only have int code points.I've split the code into two methods, one similar to the original but returns the Unicode Replacement Character on invalid code points. The second method returns a struct IEnumerable with more useful fields:
StringCodePointExtensions.cs
public static class StringCodePointExtensions {
const char ReplacementCharacter = '\ufffd';
public static IEnumerable CodePointIndexes(this string s) {
for (int i = 0; i < s.Length; i++) {
if (char.IsHighSurrogate(s, i)) {
if (i + 1 < s.Length && char.IsLowSurrogate(s, i + 1)) {
yield return CodePointIndex.Create(i, true, true);
i++;
continue;
} else {
// High surrogate without low surrogate
yield return CodePointIndex.Create(i, false, false);
continue;
}
} else if (char.IsLowSurrogate(s, i)) {
// Low surrogate without high surrogate
yield return CodePointIndex.Create(i, false, false);
continue;
}
yield return CodePointIndex.Create(i, true, false);
}
}
public static IEnumerable CodePointInts(this string s) {
return s
.CodePointIndexes()
.Select(
cpi => {
if (cpi.Valid) {
return char.ConvertToUtf32(s, cpi.Index);
} else {
return (int)ReplacementCharacter;
}
});
}
}
CodePointIndex.cs
:
public struct CodePointIndex {
public int Index;
public bool Valid;
public bool IsSurrogatePair;
public static CodePointIndex Create(int index, bool valid, bool isSurrogatePair) {
return new CodePointIndex {
Index = index,
Valid = valid,
IsSurrogatePair = isSurrogatePair,
};
}
}
To the extent possible under law, the person who associated CC0 with this work has waived all copyright and related or neighboring rights to this work.