发表新帖

发表新帖

How is this website fixing the encoding?

前端未结

关注

 6  1506

情歌与酒 2021-01-07 11:51

I am trying to turn this text:

××•×•×™×¨. ×”×¢×ª×™×“ ×©×œ ×¨×©×ª×•×ª ×—×‘×¨×ª×™×•×ª ×•×”×ª×§×©×•×¨×ª ×©×œ× ×•

Into this text:

6条回答

北海茫月 (楼主)

2021-01-07 12:15
If you look closely at the gibberish, you can tell that each Hebrew character is encoded as 2 characters - it appears that של is encoded as ×©×œ.

This suggests that you are looking at UTF8 or UTF16 as ASCII. Converting to UTF8 will not help because it is already ASCII and will keep that encoding.

You can read each pair of bytes and reconstruct the original UTF8 from them.

Here is some C# I came up with - this is very simplistic (doesn't fully work - too many assumptions), but I could see some of the characters converted properly:
```
private string ToProperHebrew(string gibberish)
{
   byte[] orig = Encoding.Unicode.GetBytes(gibberish);
   byte[] heb = new byte[orig.Length / 2];

   for (int i = 0; i < orig.Length / 2; i++)
   {
     heb[i] = orig[i * 2];
   }

   return Encoding.UTF8.GetString(heb);
}
```
If appears that each byte was re-encoded as two bytes - not sure what encoding was used for this, but discarding one byte seemed to be the right thing for most doubled up characters.
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题