reading first line in a file gives me a “\357\273\277” prefix in the first row [duplicate]

荒凉一梦 提交于 2019-12-12 16:25:07

问题


when I use the function readTheNRow with row=0 (i read the first row) i find that the three first chars are \357 ,\273 and \277. i found that this prefix is some how related to UTF-8 files, but some files have this prefix and some don't :( . how do i ignore all type of such prefixes in the files that i want to read from them?

int readTheNRow(char buff[], int row) {

int file = open("my_file.txt", O_RDONLY);
if (file < 0) {
    write(2, "closing fifo was unsuccessful\n", 31);
    exit(-1);
}

// function's variables
int i = 0;
char ch; // a temp variable to read with it
int check; // helping variable for checking the read function

// read till we reach the needed row
while (i != row) {

    // read one char
    check = read(file, &ch, 1);
    if (check < 0) {
        // write a error message to the user
        write(2, "error occurred in reading\n", 27);
        exit(-1);
    }

    if (check < 0) {
        // if means that we reached the end of file
        return -1; // couldn't read the N row (N is bigger than X)
    }
    printf("%c",ch);
    // check that the char is a \n
    if (ch == '\n') {
        i++;
    }
}

// read the number to the received buffer
i = 0;

do {
    // read one char
    check = read(file, buff + i, 1);
    if (check < 0) {
        // write a error message to the user
        write(2, "error occurred in reading\n", 27);
        exit(-1);
    }

    // if we reached the end of file
    if (check == 0) {
        break;
    }
    i++;

} while (buff[i - 1] != '\n');

// put the \0 in the end of the string
 buff[i - 1] = '\0';
return 1; // return that reading was successful

    // try to close the file
if (close(file) < 0) {
    write(2, "closing fifo was unsuccessful\n", 31);
    exit(-1);
}
}

回答1:


You seem to be trying to read a file carrying a so called BOM (Byte Ordering Mark).

Test for such prefixes and if they are around used the potenial info draw from it, then go on and read the file, interpreting it as the BOMs indicates.

The sequence \357 \273 \277 indicates UTF-8 is following. Which does not need to take byte-ordering into account, as the byte is the unit for such files.

More on the various existing BOMs here: http://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding



来源:https://stackoverflow.com/questions/24096871/reading-first-line-in-a-file-gives-me-a-357-273-277-prefix-in-the-first-row

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!