How to use UTF-8 in C code?

后端 未结 5 919
耶瑟儿~
耶瑟儿~ 2021-02-04 03:06

My setup: gcc-4.9.2, UTF-8 environment.

The following C-program works in ASCII, but does not in UTF-8.

Create input file:

echo -n \'привет мир\'          


        
5条回答
  •  天涯浪人
    2021-02-04 03:35

    Siddhartha Ghosh's answer gives you the basic problem. Fixing your code requires more work, though.

    I used the following script (chk-utf8-test.sh):

    echo -n 'привет мир' > вход
    make utf8-test
    ./utf8-test
    grep -q 'привет мир' выход && echo OK
    

    I called your program utf8-test.c and amended the source like this, removing the references to /tmp, and being more careful with lengths:

    #include 
    #include 
    #include 
    
    #define SIZE 40
    
    int main(void)
    {
        char buf[SIZE + 1];
        char *pat = "привет мир";
        char str[SIZE + 2];
    
        FILE *f1 = fopen("вход", "r");
        FILE *f2 = fopen("выход", "w");
    
        if (f1 == 0 || f2 == 0)
        {
            fprintf(stderr, "Failed to open one or both files\n");
            return(1);
        }
    
        size_t nbytes;
        if ((nbytes = fread(buf, 1, SIZE, f1)) > 0)
        {
            buf[nbytes] = 0;
    
            if (strncmp(buf, pat, nbytes) == 0)
            {
                sprintf(str, "%.*s\n", (int)nbytes, buf);
                fwrite(str, 1, nbytes, f2);
            }
        }
    
        fclose(f1);
        fclose(f2);
    
        return(0);
    }
    

    And when I ran the script, I got:

    $ bash -x chk-utf8-test.sh
    + '[' -f /etc/bashrc ']'
    + . /etc/bashrc
    ++ '[' -z '' ']'
    ++ return
    + alias 'r=fc -e -'
    + echo -n 'привет мир'
    + make utf8-test
    gcc -O3 -g -std=c11 -Wall -Wextra -Werror utf8-test.c -o utf8-test
    + ./utf8-test
    + grep -q 'привет мир' $'в?\213?\205од'
    + echo OK
    OK
    $
    

    For the record, I was using GCC 5.1.0 on Mac OS X 10.10.3.

提交回复
热议问题