Reading Chinese character from .csv file in C/C++

荒凉一梦 提交于 2020-02-15 05:48:46

问题


I have a .csv file with chinese characters. I need to read in these chinese characters and store them for further use in the program. I know that chinese characters have to be processed in utf format, using wchar_t and the like, but I am not able to figure out exactly how this is to be done. Can anyone please help me out?


回答1:


First of all, there is no unique way to encode Chinese characters. To be able to decode the file, you first have to know which encoding has been used.

The most common ones are utf-8, utf-16, big5 and gb2312. gb2312 is for simplified characters and mostly used in mainland China. big5 is for traditional characters and mostly used in Taiwan and Hongkong. Most international companies would use utf-8 or utf-16. In Utf-8 the encodings have a variable length (with a unit length of 1 byte) and is typically more efficient to store in a text contains a lot of characters in ASCII (since these only take up on byte in UTF-8), while in UTF-16 the characters have a unit length of 2 bytes (the characters also have a variable length).

It is also worth-while to read Joel Spolky's article on unicode: http://www.joelonsoftware.com/articles/Unicode.html

Let's suppose the cvs file is encoded in UTF-8. So you have to specify the encoding. Using the following, the file is interpreted as UTF-8 and converted to wchar_t which has a fix size (2 bytes in Windows and 4 bytes in Linux):

const std::locale utf8_locale
            = std::locale(std::locale(), new std::codecvt_utf8<wchar_t>());
std::wifstream file("filename");
file.imbue(utf8_locale);

You can then read and process the file for example like this

  std::wstring s;
  while (std::getline(dict, s))
  {
        // Do something with the string
        auto end1 = s.find_first_of(L';');
        ...
  }

For this you'll need these header files

#include <iostream>
#include <fstream>
#include <string>
#include <locale>
#include <codecvt>


来源:https://stackoverflow.com/questions/9900596/reading-chinese-character-from-csv-file-in-c-c

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!