How do i get the decimal value of a unicode character in C#?

前端 未结 5 1125
南笙
南笙 2020-12-31 04:48

How do i get the numeric value of a unicode character in C#?

For example if tamil character (U+0B85) given, output should be 2949 (i.e. <

相关标签:
5条回答
  • 2020-12-31 05:05

    It's basically the same as Java. If you've got it as a char, you can just convert to int implicitly:

    char c = '\u0b85';
    
    // Implicit conversion: char is basically a 16-bit unsigned integer
    int x = c;
    Console.WriteLine(x); // Prints 2949
    

    If you've got it as part of a string, just get that single character first:

    string text = GetText();
    int x = text[2]; // Or whatever...
    

    Note that characters not in the basic multilingual plane will be represented as two UTF-16 code units. There is support in .NET for finding the full Unicode code point, but it's not simple.

    0 讨论(0)
  • 2020-12-31 05:08

    This is an example of using Plane 1, the Supplementary Multilingual Plane (SMP):

    string single_character = "\U00013000"; //first Egyptian ancient hieroglyph in hex
    //it is encoded as 4 bytes (instead of 2)
    
    //get the Unicode index using UTF32 (4 bytes fixed encoding)
    Encoding enc = new UTF32Encoding(false, true, true);
    byte[] b = enc.GetBytes(single_character);
    Int32 code = BitConverter.ToInt32(b, 0); //in decimal
    
    0 讨论(0)
  • 2020-12-31 05:13
    char c = 'அ';
    short code = (short)c;
    ushort code2 = (ushort)c;
    
    0 讨论(0)
  • 2020-12-31 05:15
    ((int)'அ').ToString()
    

    If you have the character as a char, you can cast that to an int, which will represent the character's numeric value. You can then print that out in any way you like, just like with any other integer.

    If you wanted hexadecimal output instead, you can use:

    ((int)'அ').ToString("X4")
    

    X is for hexadecimal, 4 is for zero-padding to four characters.

    0 讨论(0)
  • 2020-12-31 05:26

    How do i get the numeric value of a unicode character in C#?

    A char is not necessarily the whole Unicode code point. In UTF-16 encoded languages such as C#, you may actually need 2 chars to represent a single "logical" character. And your string lengths migh not be what you expect - the MSDN documnetation for String.Length Property says:

    "The Length property returns the number of Char objects in this instance, not the number of Unicode characters."

    • So, if your Unicode character is encoded in just one char, it is already numeric (essentially an unsigned 16-bit integer). You may want to cast it to some of the integer types, but this won't change the actual bits that were originally present in the char.
    • If your Unicode character is 2 chars, you'll need to multiply one by 2^16 and add it to the other, resulting in a uint numeric value:

      char c1 = ...;
      char c2 = ...;
      uint c = ((uint)c1 << 16) | c2;

    How do i get the decimal value of a unicode character in C#?

    When you say "decimal", this usually means a character string containing only characters that a human being would interpret as decimal digits.

    • If you can represent your Unicode character by only one char, you can convert it to decimal string simply by:

      char c = 'அ';
      string s = ((ushort)c).ToString();

    • If you have 2 chars for your Unicode character, convert them to a uint as described above, then call uint.ToString.

    --- EDIT ---

    AFAIK diacritical marks are considered separate "characters" (and separate code points) despite being visually rendered together with the "base" character. Each of these code points taken alone is still at most 2 UTF-16 code units.

    BTW I think the proper name for what you are talking about is not "character" but "combining character". So yes, a single combining character can have more than 1 code point and therefore more than 2 code units. If you want a decimal representation of such as combining character, you can probably do it most easily through BigInteger:

    string c = "\x0072\x0338\x0327\x0316\x0317\x0300\x0301\x0302\x0308\x0360";
    string s = (new BigInteger(Encoding.Unicode.GetBytes(c))).ToString();
    

    Depending on what order of significance of the code unit "digits" you wish, you may want reverse the c.

    0 讨论(0)
提交回复
热议问题