问题
I am facing some issues with non-Ascii chars in C++. I have one file containg non-ascii chars which I am reading in C++ via file Handling. After reading the file(say 1.txt) I am storing the data into string stream and writing it into another file(say 2.txt).
Assume 1.txt contains:
ação
In 2.txt I should get same ouyput but non-Ascii chars are printed as their Hex value in 2.txt.
Also, I am quite sure that C++ is handling Ascii chars as Ascii only.
Please Help on how to print these chars correctly in 2.txt
EDIT:
Firstly Psuedo-Code for Whole Process:
1.Shell script to Read from DB one Value and stores in 11.txt
2.CPP Code(a.cpp) reading 11.txt and Writing to f.txt
Data Present in DB which is being read: Instalação
File 11.txt contains: Instalação
File F.txt Contains: Instalação
Ouput of a.cpp on screen: Instalação
a.cpp
#include <iterator>
#include <iostream>
#include <algorithm>
#include <sstream>
#include<fstream>
#include <iomanip>
using namespace std;
int main()
{
ifstream myReadFile;
ofstream f2;
myReadFile.open("11.txt");
f2.open("f2.txt");
string output;
if (myReadFile.is_open())
{
while (!myReadFile.eof())
{
myReadFile >> output;
//cout<<output;
cout<<"\n";
std::stringstream tempDummyLineItem;
tempDummyLineItem <<output;
cout<<tempDummyLineItem.str();
f2<<tempDummyLineItem.str();
}
}
myReadFile.close();
return 0;
}
Locale says this:
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
回答1:
At least if I understand what you're after, I'd do something like this:
#include <iterator>
#include <iostream>
#include <algorithm>
#include <sstream>
#include <iomanip>
std::string to_hex(char ch) {
std::ostringstream b;
b << "\\x" << std::setfill('0') << std::setw(2) << std::setprecision(2)
<< std::hex << static_cast<unsigned int>(ch & 0xff);
return b.str();
}
int main(){
// for test purposes, we'll use a stringstream for input
std::stringstream infile("normal stuff. weird stuff:\x01\xee:back to normal");
infile << std::noskipws;
// copy input to output, converting non-ASCII to hex:
std::transform(std::istream_iterator<char>(infile),
std::istream_iterator<char>(),
std::ostream_iterator<std::string>(std::cout),
[](char ch) {
return (ch >= ' ') && (ch < 127) ?
std::string(1, ch) :
to_hex(ch);
});
}
回答2:
Sounds to me like a utf8 issue. Since you didn't tag your question with c++11 Here Is an excelent article on unicode and c++ streams.
From your updated code, let me explain what is happening. You create a file stream to read your file. Internally the file stream only recognizes chars
, until you tell it otherwise. A char
, on most machines, can only hold 8 bits of data, but the characters in your file are using more than 8 bits. To be able to read your file correctly, you NEED to know how it is encoded. The most common encoding is UTF-8, which uses between 1 and 4 chars
for each character.
Once you know your encoding, you can either use wifstream (for UTF-16) or imbue()
a locale for other encodings.
Update: If your file is ISO-88591 (from your comment above), try this.
wifstream myReadFile;
myReadFile.imbue(std::locale("en_US.iso88591"));
myReadFile.open("11.txt");
来源:https://stackoverflow.com/questions/17648966/handling-non-ascii-chars-in-c