问题
When i read from a file string by string, >> operation gets first string but it starts with "i" . Assume that first string is "street", than it gets as "istreet".
Other strings are okay. I tried for different txt files. The result is same. First string starts with "i". What is the problem?
Here is my code :
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
using namespace std;
int cube(int x){ return (x*x*x);}
int main(){
int maxChar;
int lineLength=0;
int cost=0;
cout<<"Enter the max char per line... : ";
cin>>maxChar;
cout<<endl<<"Max char per line is : "<<maxChar<<endl;
fstream inFile("bla.txt",ios::in);
if (!inFile) {
cerr << "Unable to open file datafile.txt";
exit(1); // call system to stop
}
while(!inFile.eof()) {
string word;
inFile >> word;
cout<<word<<endl;
cout<<word.length()<<endl;
if(word.length()+lineLength<=maxChar){
lineLength +=(word.length()+1);
}
else {
cost+=cube(maxChar-(lineLength-1));
lineLength=(word.length()+1);
}
}
}
回答1:
You're seeing a UTF-8 Byte Order Mark (BOM). It was added by the application that created the file.
To detect and ignore the marker you could try this (untested) function:
bool SkipBOM(std::istream & in)
{
char test[4] = {0};
in.read(test, 3);
if (strcmp(test, "\xEF\xBB\xBF") == 0)
return true;
in.seekg(0);
return false;
}
回答2:
With reference to the excellent answer by Mark Ransom above, adding this code skips the BOM (Byte Order Mark) on an existing stream. Call it after opening a file.
// Skips the Byte Order Mark (BOM) that defines UTF-8 in some text files.
void SkipBOM(std::ifstream &in)
{
char test[3] = {0};
in.read(test, 3);
if ((unsigned char)test[0] == 0xEF &&
(unsigned char)test[1] == 0xBB &&
(unsigned char)test[2] == 0xBF)
{
return;
}
in.seekg(0);
}
To use:
ifstream in(path);
SkipBOM(in);
string line;
while (getline(in, line))
{
// Process lines of input here.
}
回答3:
Here is another two ideas.
- if you are the one who create the files, save they length along with them, and when reading them, just cut all the prefix with this simple calculation: trueFileLength - savedFileLength = numOfByesToCut
- create your own prefix when saving the files, and when reading search for it and delete all what you found before.
来源:https://stackoverflow.com/questions/10417613/c-reading-from-file-puts-three-weird-characters