Transforming string to UTF8

亡梦爱人 提交于 2020-01-04 14:19:46

问题


I have a string that I receive from email via C# and I want to display it in a correct format. I know the encoding in coming in as Encoding.Default, According to this answer I have to convert it to utf8, So I tried this code:

byte[] bytes = Encoding.Default.GetBytes(input);
string strResult = Encoding.UTF8.GetString(bytes);

It works, but it can't convert some characters:
Actually in web mail interface Original string is:

باسلام همکار گرامی شماره 53018 مربوط به دبیرخانه ستاد می باشد لطفا اصلاح فرمائید 

When I convert the string with the code I give this result:

باس �?ا�? �?�?�?ار گرا�?�? �?ا�?�? ش�?ار�? 53018  �?رب�?ط ب�? د ب�?رخا�?�? ستاد �?�? باشد �?طفا اص�?اح فر�?ائ�?د�? 

Any idea?
Update: PS: The content of the input variable:

اÙزاÙØ´ تسÙÙÙات \r\n \r\n\r\n باس Ùا٠ÙÙÙار گراÙÙ ÙاÙÙ Ø´ÙارÙ

回答1:


Finally solved the problem (+), As you know UTF-8 code unit values have been stored as a sequence of 16-bit code units in a C# string, So we should verify that each code unit is within the range of a byte, First we should copy those values into bytes and then convert the new UTF-8 byte sequence into UTF-16:

byte[] utf8Bytes = new byte[utf8String.Length];
for (int i=0;i<utf8String.Length;++i) {
      utf8Bytes[i] = (byte)utf8String[i];
}
var result  = Encoding.UTF8.GetString(utf8Bytes,0,utf8Bytes.Length);

So for this input:

اÙزاÙØ´ تسÙÙÙات \r\n\r\n\r\n<p>باسÙا٠ÙÙÙار گراÙÙ ÙاÙÙ Ø´ÙارÙ&nbsp;53018 &nbsp;ÙربÙØ· ب٠د بÙرخاÙ٠ستاد Ù٠باشد ÙØ·Ùا اصÙاح ÙرÙائÙد\r\n\r\n

I get the correct result:

افزايش تسهيلات \r\n\r\n\r\n<p>باسلام همكار گرامي نامه شماره&nbsp;53018 &nbsp;مربوط به د بيرخانه ستاد مي باشد لطفا اصلاح فرمائيد\r\n\r\n \r\n\r\n

PS: for removing extra characters I use this code:

result = result.Replace('\r', ' ').Replace('\n', ' ').ToString();


来源:https://stackoverflow.com/questions/31956807/transforming-string-to-utf8

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!