How do I translate 8bit characters into 7bit characters? (i.e. Ü to U)

前端 未结 15 1954
旧时难觅i
旧时难觅i 2020-12-05 10:29

I\'m looking for pseudocode, or sample code, to convert higher bit ascii characters (like, Ü which is extended ascii 154) into U (which is ascii 85).

My initial gues

相关标签:
15条回答
  • 2020-12-05 11:08

    It really depends on the nature of your source strings. If you know the string's encoding, and you know that it's an 8-bit encoding — for example, ISO Latin 1 or similar — then a simple static array is sufficient:

    static const char xlate[256] = { ..., ['é'] = 'e', ..., ['Ü'] = 'U', ... }
    ...
    new_c = xlate[old_c];
    

    On the other hand, if you have a different encoding, or if you're using UTF-8 encoded strings, you will probably find the functions in the ICU library very helpful.

    0 讨论(0)
  • 2020-12-05 11:11

    I think you just can't.

    I usually do something like that:

    AccentString = 'ÀÂÄÉÈÊ[and all the other]'
    ConvertString = 'AAAEEE[and all the other]'

    Looking for the char in AccentString and replacing it for the same index in ConvertString

    HTH

    0 讨论(0)
  • 2020-12-05 11:12

    Most languages have a standard way to replace accented characters with standard ASCII, but it depends on the language, and it often involves replacing a single accented character with two ASCII ones. e.g. in German ü becomes ue. So if you want to handle natural languages properly it's a lot more complicated than you think it is.

    0 讨论(0)
  • 2020-12-05 11:14

    A lookup array is probably the simplest and fastest way to accomplish this. This is one way that you can convert say, ASCII to EBCDIC.

    0 讨论(0)
  • 2020-12-05 11:15

    Is converting Ü to U really what you would like to do? I don't know about other languages but in German Ü would become Ue, ö would become oe, etc.

    0 讨论(0)
  • 2020-12-05 11:15

    Try the uni2ascii program.

    0 讨论(0)
提交回复
热议问题