I have a .csv file with chinese characters. I need to read in these chinese characters and store them for further use in the program. I know that chinese characters have to
First of all, there is no unique way to encode Chinese characters. To be able to decode the file, you first have to know which encoding has been used.
The most common ones are utf-8, utf-16, big5 and gb2312. gb2312 is for simplified characters and mostly used in mainland China. big5 is for traditional characters and mostly used in Taiwan and Hongkong. Most international companies would use utf-8 or utf-16. In Utf-8 the encodings have a variable length (with a unit length of 1 byte) and is typically more efficient to store in a text contains a lot of characters in ASCII (since these only take up on byte in UTF-8), while in UTF-16 the characters have a unit length of 2 bytes (the characters also have a variable length).
It is also worth-while to read Joel Spolky's article on unicode: http://www.joelonsoftware.com/articles/Unicode.html
Let's suppose the cvs file is encoded in UTF-8. So you have to specify the encoding. Using the following, the file is interpreted as UTF-8 and converted to wchar_t which has a fix size (2 bytes in Windows and 4 bytes in Linux):
const std::locale utf8_locale
= std::locale(std::locale(), new std::codecvt_utf8<wchar_t>());
std::wifstream file("filename");
file.imbue(utf8_locale);
You can then read and process the file for example like this
std::wstring s;
while (std::getline(dict, s))
{
// Do something with the string
auto end1 = s.find_first_of(L';');
...
}
For this you'll need these header files
#include <iostream>
#include <fstream>
#include <string>
#include <locale>
#include <codecvt>