Convert a Unicode string to an escaped ASCII string

前端 未结 9 1416
广开言路
广开言路 2020-11-22 04:00

How can I convert this string:

This string contains the Unicode character Pi(π)

into an escaped A

相关标签:
9条回答
  • 2020-11-22 04:37

    For Unescape You can simply use this functions:

    System.Text.RegularExpressions.Regex.Unescape(string)
    
    System.Uri.UnescapeDataString(string)
    

    I suggest using this method (It works better with UTF-8):

    UnescapeDataString(string)
    
    0 讨论(0)
  • 2020-11-22 04:39

    This goes back and forth to and from the \uXXXX format.

    class Program {
        static void Main( string[] args ) {
            string unicodeString = "This function contains a unicode character pi (\u03a0)";
    
            Console.WriteLine( unicodeString );
    
            string encoded = EncodeNonAsciiCharacters(unicodeString);
            Console.WriteLine( encoded );
    
            string decoded = DecodeEncodedNonAsciiCharacters( encoded );
            Console.WriteLine( decoded );
        }
    
        static string EncodeNonAsciiCharacters( string value ) {
            StringBuilder sb = new StringBuilder();
            foreach( char c in value ) {
                if( c > 127 ) {
                    // This character is too big for ASCII
                    string encodedValue = "\\u" + ((int) c).ToString( "x4" );
                    sb.Append( encodedValue );
                }
                else {
                    sb.Append( c );
                }
            }
            return sb.ToString();
        }
    
        static string DecodeEncodedNonAsciiCharacters( string value ) {
            return Regex.Replace(
                value,
                @"\\u(?<Value>[a-zA-Z0-9]{4})",
                m => {
                    return ((char) int.Parse( m.Groups["Value"].Value, NumberStyles.HexNumber )).ToString();
                } );
        }
    }
    

    Outputs:

    This function contains a unicode character pi (π)

    This function contains a unicode character pi (\u03a0)

    This function contains a unicode character pi (π)

    0 讨论(0)
  • 2020-11-22 04:43
    string StringFold(string input, Func<char, string> proc)
    {
      return string.Concat(input.Select(proc).ToArray());
    }
    
    string FoldProc(char input)
    {
      if (input >= 128)
      {
        return string.Format(@"\u{0:x4}", (int)input);
      }
      return input.ToString();
    }
    
    string EscapeToAscii(string input)
    {
      return StringFold(input, FoldProc);
    }
    
    0 讨论(0)
  • 2020-11-22 04:44

    You need to use the Convert() method in the Encoding class:

    • Create an Encoding object that represents ASCII encoding
    • Create an Encoding object that represents Unicode encoding
    • Call Encoding.Convert() with the source encoding, the destination encoding, and the string to be encoded

    There is an example here:

    using System;
    using System.Text;
    
    namespace ConvertExample
    {
       class ConvertExampleClass
       {
          static void Main()
          {
             string unicodeString = "This string contains the unicode character Pi(\u03a0)";
    
             // Create two different encodings.
             Encoding ascii = Encoding.ASCII;
             Encoding unicode = Encoding.Unicode;
    
             // Convert the string into a byte[].
             byte[] unicodeBytes = unicode.GetBytes(unicodeString);
    
             // Perform the conversion from one encoding to the other.
             byte[] asciiBytes = Encoding.Convert(unicode, ascii, unicodeBytes);
    
             // Convert the new byte[] into a char[] and then into a string.
             // This is a slightly different approach to converting to illustrate
             // the use of GetCharCount/GetChars.
             char[] asciiChars = new char[ascii.GetCharCount(asciiBytes, 0, asciiBytes.Length)];
             ascii.GetChars(asciiBytes, 0, asciiBytes.Length, asciiChars, 0);
             string asciiString = new string(asciiChars);
    
             // Display the strings created before and after the conversion.
             Console.WriteLine("Original string: {0}", unicodeString);
             Console.WriteLine("Ascii converted string: {0}", asciiString);
          }
       }
    }
    
    0 讨论(0)
  • 2020-11-22 04:47
    class Program
    {
            static void Main(string[] args)
            {
                char[] originalString = "This string contains the unicode character Pi(π)".ToCharArray();
                StringBuilder asAscii = new StringBuilder(); // store final ascii string and Unicode points
                foreach (char c in originalString)
                {
                    // test if char is ascii, otherwise convert to Unicode Code Point
                    int cint = Convert.ToInt32(c);
                    if (cint <= 127 && cint >= 0)
                        asAscii.Append(c);
                    else
                        asAscii.Append(String.Format("\\u{0:x4} ", cint).Trim());
                }
                Console.WriteLine("Final string: {0}", asAscii);
                Console.ReadKey();
            }
    }
    

    All non-ASCII chars are converted to their Unicode Code Point representation and appended to the final string.

    0 讨论(0)
  • 2020-11-22 04:52

    As a one-liner:

    var result = Regex.Replace(input, @"[^\x00-\x7F]", c => 
        string.Format(@"\u{0:x4}", (int)c.Value[0]));
    
    0 讨论(0)
提交回复
热议问题