How to check if a locale is UTF-8?

后端 未结 1 1355
北恋
北恋 2021-01-15 05:40

I\'m working with Yocto to create an embedded linux distribution for an ARM device (i.MX 6Quad Processors).

I\'ve configured the list of desired locales with the var

1条回答
  •  北海茫月
    2021-01-15 06:25

    LC_IDENTIFICATION doesn't tell you much:

    LC_IDENTIFICATION - this is not a user-visible category, it contains information about the locale itself and is rarely useful for users or developers (but is listed here for completeness sake).

    You'd have to look at the complete set of files.

    There appears to be no standard command-line utility for doing this, but there is a runtime call (added a little later than the original locale functions). Here is a sample program which illustrates the function nl_langinfo:

    #include 
    #include 
    #include 
    
    int
    main(int argc, char **argv)
    {
        int n;
        for (n = 1; n < argc; ++n) {
            if (setlocale(LC_ALL, argv[n]) != 0) {
    
                char *code = nl_langinfo(CODESET);
                if (code != 0)
                    printf("%s ->%s\n", argv[n], code);
                else
                    printf("?%s (nl_langinfo)\n", argv[n]);
            } else {
                printf("? %s (setlocale)\n", argv[n]);
            }
        }
        return 0;
    }
    

    and some output, e.g., by foo $(locale -a):

    aa_DJ ->ISO-8859-1
    aa_DJ.iso88591 ->ISO-8859-1
    aa_DJ.utf8 ->UTF-8
    aa_ER ->UTF-8
    aa_ER@saaho ->UTF-8
    aa_ER.utf8 ->UTF-8
    aa_ER.utf8@saaho ->UTF-8
    aa_ET ->UTF-8
    aa_ET.utf8 ->UTF-8
    af_ZA ->ISO-8859-1
    af_ZA.iso88591 ->ISO-8859-1
    af_ZA.utf8 ->UTF-8
    am_ET ->UTF-8
    am_ET.utf8 ->UTF-8
    an_ES ->ISO-8859-15
    an_ES.iso885915 ->ISO-8859-15
    an_ES.utf8 ->UTF-8
    ar_AE ->ISO-8859-6
    ar_AE.iso88596 ->ISO-8859-6
    ar_AE.utf8 ->UTF-8
    ar_BH ->ISO-8859-6
    ar_BH.iso88596 ->ISO-8859-6
    

    The directory names you're referring to are often (but not required) to be the same as encoding names. That is the assumption made in the example program. There was a related question in How to get terminal's Character Encoding, but it has no useful answers. One is interesting though, since it asserts that

    locale charmap
    

    will give the locale encoding. According to the standard, that's not necessarily so:

    • The command locale charmap gives the name used in localedef -f

    • However, localedef attaches no special meaning to the name given in the -f option.

    • localedef has a different option -u which identifies the codeset, but locale (in the standard) mentions no method for displaying this information.

    As usual, implementations may (or may not) treat unspecified features in different ways. The GNU C library's documentation differs in some respects from the standard (see locale and localedef), but offers no explicit options for showing the codeset name.

    0 讨论(0)
提交回复
热议问题