问题
when I use the function readTheNRow with row=0 (i read the first row) i find that the three first chars are \357 ,\273 and \277. i found that this prefix is some how related to UTF-8 files, but some files have this prefix and some don't :( . how do i ignore all type of such prefixes in the files that i want to read from them?
int readTheNRow(char buff[], int row) {
int file = open("my_file.txt", O_RDONLY);
if (file < 0) {
write(2, "closing fifo was unsuccessful\n", 31);
exit(-1);
}
// function's variables
int i = 0;
char ch; // a temp variable to read with it
int check; // helping variable for checking the read function
// read till we reach the needed row
while (i != row) {
// read one char
check = read(file, &ch, 1);
if (check < 0) {
// write a error message to the user
write(2, "error occurred in reading\n", 27);
exit(-1);
}
if (check < 0) {
// if means that we reached the end of file
return -1; // couldn't read the N row (N is bigger than X)
}
printf("%c",ch);
// check that the char is a \n
if (ch == '\n') {
i++;
}
}
// read the number to the received buffer
i = 0;
do {
// read one char
check = read(file, buff + i, 1);
if (check < 0) {
// write a error message to the user
write(2, "error occurred in reading\n", 27);
exit(-1);
}
// if we reached the end of file
if (check == 0) {
break;
}
i++;
} while (buff[i - 1] != '\n');
// put the \0 in the end of the string
buff[i - 1] = '\0';
return 1; // return that reading was successful
// try to close the file
if (close(file) < 0) {
write(2, "closing fifo was unsuccessful\n", 31);
exit(-1);
}
}
回答1:
You seem to be trying to read a file carrying a so called BOM (Byte Ordering Mark).
Test for such prefixes and if they are around used the potenial info draw from it, then go on and read the file, interpreting it as the BOMs indicates.
The sequence \357 \273 \277
indicates UTF-8 is following. Which does not need to take byte-ordering into account, as the byte is the unit for such files.
More on the various existing BOMs here: http://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding
来源:https://stackoverflow.com/questions/24096871/reading-first-line-in-a-file-gives-me-a-357-273-277-prefix-in-the-first-row