问题
I was using XVI32 (Hex Editor) to get the hex representation of the Euro symbol and it gives me the value as 80.
Another site: http://www.string-functions.com/string-hex.aspx does the same.
I am not able to understand why the hex representation is 80 instead of 0x20AC.
This 0X80 gives 128 in decimal and if I use Alt+0128 it actually produces the Euro symbol.
Could somebody throw some light on what could be the logic behind this conversion from string to hex conversion ?
Thanks
回答1:
A character encoding (or charset) maps characters to a sequence of byte values. Your charset is windows-1252, which encodes the euro symbol as the single hex byte 0x80 (which is 128 in decimal, as Oded says). Each charset encodes non-ASCII characters differenly; there's nothing fundamentally "right" or "wrong" about that 0x80.
回答2:
128 in decimal is 80 in Hexadecimal.
edit: and 0x20AC would be 8364 in decimal.
According to this page, 128 is incorrect for UTF-8 (or any other unicode), but right for windows-1252 (and iso-8859-15 also has it, though elsewhere).
Typically, if you use, on Windows, a keyboard key labeled with the euro sign, the raw octet 128 is what you actually produce and insert into a file .... Such a method is formally correct if the document is accompanied with information that specifies an encoding where the data maps to the character in question. This would mean, windows-1252 or iso-8859-15 encoding, respectively, which should be specified in the HTTP headers.
回答3:
Unicode came very late in the picture for encoding characters (around 1992-93). Before that OEM used their specific encoding. On Windows environment you have many encoding, specific to a locale. Therefore, as per the Window's encoding of Windows: Western, you get 0x80 for euro. However Unicode covered all currency symbols in the Currency Symbol Blocks of BMP (U+20A0 to U+20CF). Therefore as per Unicode, you have U+20AC for euro currency symbol and as per Windows encoding, you have 0x80 for the same.
To see the difference, on a windows machine open charmap.exe and check advanced view. From the character sets select Windows: Western. There you'll see the reason.
For more info, see https://en.wikipedia.org/wiki/Windows-1252
回答4:
The reason you see different results is character encodings:
The number 0x20AC is the unicode codepoint for the euro symbol. Depending on the used encoding you get various codes. Under Windows you have usually something like cp1252 or so for german for example, which is pretty similar to ISO8859-1 encoding and that one contains the euro symbol at 0x80.
So what hex code you get for the euro symbol depends on the encoding used in the data you look at. You can have a look at the unicode.org provided encoding files to see some of the various encodings available, have a look at the mappings available here: http://unicode.org/Public/MAPPINGS/
回答5:
I guess that in your machine (and on that site), the local code page is such that code 128 maps to the Euro symbol. On my machine Alt+0128 maps to the Hebrew character Alef, because it's set to a different code page.
You can see the Unicode code for Euro by typing
javascript:alert("€".charCodeAt(0))
in your browser's address bar.
回答6:
0x20AC should be the correct one since euro symbol is (extended) unicode character.
The fact that pressing Alt+0128 produces euro symbol has nothing to do with this (you're probably doing it in windows? It's windows specific thing then)
0x80 or 128 is not a valid html code (The behavior is undefined): http://www.ascii.cl/htmlcodes.htm
Read more on: http://www.cs.tut.fi/~jkorpela/html/euro.html
来源:https://stackoverflow.com/questions/4640354/hex-representation-of-euro-symbol-%e2%82%ac