Handling Non-Ascii Chars in C++

前端未结

关注

 2  723

I am facing some issues with non-Ascii chars in C++. I have one file containg non-ascii chars which I am reading in C++ via file Handling. After reading the file(say 1.txt)

相关标签:

2条回答

没有蜡笔的小新

2021-01-01 05:36

At least if I understand what you're after, I'd do something like this:

#include <iterator>
#include <iostream>
#include <algorithm>
#include <sstream>
#include <iomanip>

std::string to_hex(char ch) {
    std::ostringstream b;
    b << "\\x" << std::setfill('0') << std::setw(2) << std::setprecision(2)
        << std::hex << static_cast<unsigned int>(ch & 0xff);
    return b.str();
}

int main(){
    // for test purposes, we'll use a stringstream for input
    std::stringstream infile("normal stuff. weird stuff:\x01\xee:back to normal");

    infile << std::noskipws;

    // copy input to output, converting non-ASCII to hex:
    std::transform(std::istream_iterator<char>(infile),
        std::istream_iterator<char>(),
        std::ostream_iterator<std::string>(std::cout),
        [](char ch) {
            return (ch >= ' ') && (ch < 127) ?
                std::string(1, ch) :
                to_hex(ch);
    });
}

0 讨论(0)

情深已故

2021-01-01 05:44
Sounds to me like a utf8 issue. Since you didn't tag your question with c++11 Here Is an excelent article on unicode and c++ streams.

From your updated code, let me explain what is happening. You create a file stream to read your file. Internally the file stream only recognizes chars, until you tell it otherwise. A char, on most machines, can only hold 8 bits of data, but the characters in your file are using more than 8 bits. To be able to read your file correctly, you NEED to know how it is encoded. The most common encoding is UTF-8, which uses between 1 and 4 chars for each character.

Once you know your encoding, you can either use wifstream (for UTF-16) or imbue() a locale for other encodings.

Update: If your file is ISO-88591 (from your comment above), try this.
```
wifstream myReadFile;
myReadFile.imbue(std::locale("en_US.iso88591"));
myReadFile.open("11.txt");
```
0 讨论(0)
发布评论:

提交评论
- 加载中...