问题
I'm trying to read lines from .txt files, that have been saved as Unicode. That's how i'm doing it:
wifstream input;
string path = "test.txt";
input.imbue(locale(input.getloc(),
new codecvt_utf16<wchar_t, 0x10ffff, consume_header>));
input.open(path);
if (input.is_open())
{
wstring line;
input.seekg( 1 , ios_base::beg);
getline(input, line);
}
It works fine for files with Latin characters. But for Cyrillic files I get weird symbols instead of spaces and adjacent characters.
For example:
What is in the input file:
Госдеп США осудил нападение на
What I get:
︓осдепР!ШАР>судилР=ападениеР=а
What am I doing wrong?
回答1:
one line looks very suspicous in your code:
input.seekg(1, ios_base::beg);
it sets file position, so reading utf16 string starting position 1 might be incorrect (BOM is read incorrectly). i have the same result for utf16 file in little endian.
so you might change position to 0 or delete this line in order to make this code work
回答2:
Well, figured out the way:
FILE *input= _wfopen(L"test.txt", L"rb");
wchar_t line[1000];
test.txtfgetws(line, 1000, input);
Works fine like that. Was quite stupid of me not to try it first. So thanks everyone.
来源:https://stackoverflow.com/questions/30329347/how-to-read-cyrillic-unicode-file-in-c