发表新帖

发表新帖

Why do those Thai characters display on the web page with a long tail?

前端未结

关注

 4  657

北荒 2021-02-01 16:54

ด้้้้้็็็็็้้้้้็็็็็้้้้้็็็็็้้้้้็็็็็้้้้้็็็็็้้้้้็็็็็้้้้้็็็็็้้้้้дด็็็็็้้้้้็็็็้้้้้็็็็็้้้้้็็็็็้้้้้็็็็็้้้้้

I found some interesting characters just

4条回答

暖寄归人 (楼主)

2021-02-01 17:14
The codes you mention are all in UTF-8, which is why each character needs 3 bytes. The respectice Unicode codes are:
- DO DEK 0x0e14
- MAI THO 0x0e49
- MAITAIKHU 0x0e47
The latter two are in the category Mark, Nonspacing, and have the Combine property (Canonical_Combining_Class) set to 107, meaning that the code points are combined with the preceding code point in rendering.

You example starts with a single character and adds lots of nonspacing marks on top of it.

Compare with this C# code:
```
char DODEK = (char)0x0e14;
char MAITHO = (char)0x0e49;
char MAITAIKHU = (char)0x0e47;

string thai = new string(new char[] { DODEK, MAITHO, MAITAIKHU });
Console.WriteLine("number of code points: " + thai.Length);

var si = new System.Globalization.StringInfo(thai);
Console.WriteLine("number of text elements: " + si.LengthInTextElements);
```
Output:
```
number of code points: 3
number of text elements: 1
```
See also .Net StringInfo class.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题