How do I translate 8bit characters into 7bit characters? (i.e. Ü to U)

前端 未结 15 1953
旧时难觅i
旧时难觅i 2020-12-05 10:29

I\'m looking for pseudocode, or sample code, to convert higher bit ascii characters (like, Ü which is extended ascii 154) into U (which is ascii 85).

My initial gues

相关标签:
15条回答
  • 2020-12-05 11:16

    There is an article on CodeProject that looks good.

    Also the conversion using codepage 1251 take my interest (see other answer).

    I don't like the conversion tables, since the number of characters in Unicode are that large you easily miss one.

    0 讨论(0)
  • 2020-12-05 11:17

    Hm, why not just change the encoding of the string with iconv?

    0 讨论(0)
  • 2020-12-05 11:18

    For .NET users the article in CodeProject (thanks to GvS's tip) does indeed answer the question more correctly than any other I've seen so far.

    However the code in that article (in solution #1) is cumbersome. Here's a compact version:

    // Based on http://www.codeproject.com/Articles/13503/Stripping-Accents-from-Latin-Characters-A-Foray-in
    private static string LatinToAscii(string inString)
    {
        var newStringBuilder = new StringBuilder();
        newStringBuilder.Append(inString.Normalize(NormalizationForm.FormKD)
                                        .Where(x => x < 128)
                                        .ToArray());
        return newStringBuilder.ToString();
    }
    

    To expand a bit on the answer, this method uses String.Normalize which:

    Returns a new string whose textual value is the same as this string, but whose binary representation is in the specified Unicode normalization form.

    Specifically in this case we use the NormalizationForm FormKD, described in those same MSDN docs as such:

    FormKD - Indicates that a Unicode string is normalized using full compatibility decomposition.

    For more information about unicode normalization forms, see Unicode Annex #15.

    0 讨论(0)
提交回复
热议问题