ด้้้้้็็็็็้้้้้็็็็็้้้้้็็็็็้้้้้็็็็็้้้้้็็็็็้้้้้็็็็็้้้้้็็็็็้้้้้дด็็็็็้้้้้็็็็้้้้้็็็็็้้้้้็็็็็้้้้้็็็็็้้้้้
I found some interesting characters just
The codes you mention are all in UTF-8, which is why each character needs 3 bytes. The respectice Unicode codes are:
DO DEK 0x0e14
MAI THO 0x0e49
MAITAIKHU 0x0e47
The latter two are in the category Mark, Nonspacing
, and have the Combine
property (Canonical_Combining_Class) set to 107, meaning that the code points are combined with the preceding code point in rendering.
You example starts with a single character and adds lots of nonspacing marks on top of it.
Compare with this C# code:
char DODEK = (char)0x0e14;
char MAITHO = (char)0x0e49;
char MAITAIKHU = (char)0x0e47;
string thai = new string(new char[] { DODEK, MAITHO, MAITAIKHU });
Console.WriteLine("number of code points: " + thai.Length);
var si = new System.Globalization.StringInfo(thai);
Console.WriteLine("number of text elements: " + si.LengthInTextElements);
Output:
number of code points: 3
number of text elements: 1
See also .Net StringInfo class.