Convert a Unicode string to an escaped ASCII string

前端 未结 9 1415
广开言路
广开言路 2020-11-22 04:00

How can I convert this string:

This string contains the Unicode character Pi(π)

into an escaped A

9条回答
  •  -上瘾入骨i
    2020-11-22 05:01

    Here is my current implementation:

    public static class UnicodeStringExtensions
    {
        public static string EncodeNonAsciiCharacters(this string value) {
            var bytes = Encoding.Unicode.GetBytes(value);
            var sb = StringBuilderCache.Acquire(value.Length);
            bool encodedsomething = false;
            for (int i = 0; i < bytes.Length; i += 2) {
                var c = BitConverter.ToUInt16(bytes, i);
                if ((c >= 0x20 && c <= 0x7f) || c == 0x0A || c == 0x0D) {
                    sb.Append((char) c);
                } else {
                    sb.Append($"\\u{c:x4}");
                    encodedsomething = true;
                }
            }
            if (!encodedsomething) {
                StringBuilderCache.Release(sb);
                return value;
            }
            return StringBuilderCache.GetStringAndRelease(sb);
        }
    
    
        public static string DecodeEncodedNonAsciiCharacters(this string value)
          => Regex.Replace(value,/*language=regexp*/@"(?:\\u[a-fA-F0-9]{4})+", Decode);
    
        static readonly string[] Splitsequence = new [] { "\\u" };
        private static string Decode(Match m) {
            var bytes = m.Value.Split(Splitsequence, StringSplitOptions.RemoveEmptyEntries)
                    .Select(s => ushort.Parse(s, NumberStyles.HexNumber)).SelectMany(BitConverter.GetBytes).ToArray();
            return Encoding.Unicode.GetString(bytes);
        }
    }
    

    This passes a test:

    public void TestBigUnicode() {
        var s = "\U00020000";
        var encoded = s.EncodeNonAsciiCharacters();
        var decoded = encoded.DecodeEncodedNonAsciiCharacters();
        Assert.Equals(s, decoded);
    }
    

    with the encoded value: "\ud840\udc00"

    This implementation makes use of a StringBuilderCache (reference source link)

提交回复
热议问题