What 8-bit encodings use the C1 range for characters? (x80—x9F or 128—159)

北城以北 提交于 2019-12-11 09:27:36

问题


Wikipedia has a listing of the x80—x9F "C1" range under Latin 1 Supplement for Unicode. This range is also reserved in the ISO-8859-1 codepage.

I'm looking at a file of strings, all of which are within the 7-bit ASCII range except for a few instances of \x96 where it looks like a dash would be, such as the middle of a street address.

I don't know if other characters in the C1 range might eventually show up in the data, so I'd like to know if there's a correct way to read the file. Are there are any 8-bit encodings which use x80 through x9F for character data instead of terminal control characters?


回答1:


There is a large number (potentially an infinite number) of 8-bit encodings that assign graphic characters to some or all bytes in the range 0x80 to 0x9F. Several encodings defined by Microsoft have U+2013 EN DASH “–” at byte position 0x96, and this character could conceivably appear in a street address, especially between numbers.

On the other hand, e.g. MacRoman has the letter “ñ” at position 0x96, and it could well appear within a street name in Spanish, for example.

For a rational analysis of the situation, you should inspect the data as a whole, possibly using a filter that finds all bytes outside the Ascii range 0x00 to 0x7F, look at the contexts in which the characters appear, and try to find technical information about the origin of the data.




回答2:


It's an en dash. I guess slightly different than a hyphen (0x2D).

http://www.ascii-code.com/



来源:https://stackoverflow.com/questions/18410167/what-8-bit-encodings-use-the-c1-range-for-characters-x80-x9f-or-128-159

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!