I\'m trying to write a function which compares the content of two files.
I want it to return 1 if files are the same, and 0 if different.
ch1
an
Switch's code looks good to me, but if you want an exact comparison the while condition and the return need to be altered:
int compareFile(FILE* f1, FILE* f2) {
int N = 10000;
char buf1[N];
char buf2[N];
do {
size_t r1 = fread(buf1, 1, N, f1);
size_t r2 = fread(buf2, 1, N, f2);
if (r1 != r2 ||
memcmp(buf1, buf2, r1)) {
return 0; // Files are not equal
}
} while (!feof(f1) && !feof(f2));
return feof(f1) && feof(f2);
}
Better to use fread
and memcmp
to avoid \0 character issues. Also, the !feof
checks really should be || instead of && since there's a small chance that one file is bigger than the other and the smaller file is divisible by your buffer size..
int compareFile(FILE* f1, FILE* f2) {
int N = 10000;
char buf1[N];
char buf2[N];
do {
size_t r1 = fread(buf1, 1, N, f1);
size_t r2 = fread(buf2, 1, N, f2);
if (r1 != r2 ||
memcmp(buf1, buf2, r1)) {
return 0;
}
} while (!feof(f1) || !feof(f2));
return 1;
}
Since you've allocated your arrays on the stack, they are filled with random values ... they aren't zeroed out.
Secondly, strcmp
will only compare to the first NULL value, which, if it's a binary file, won't necessarily be at the end of the file. Therefore you should really be using memcmp
on your buffers. But again, this will give unpredictable results because of the fact that your buffers were allocated on the stack, so even if you compare to files that are the same, the end of the buffers past the EOF may not be the same, so memcmp
will still report false results (i.e., it will most likely report that the files are not the same when they are because of the random values at the end of the buffers past each respective file's EOF).
To get around this issue, you should really first measure the length of the file by first iterating through the file and seeing how long the file is in bytes, and then using malloc
or calloc
to allocate the buffers you're going to compare, and re-fill those buffers with the actual file's contents. Then you should be able to make a valid comparison of the binary contents of each file. You'll also be able to work with files larger than 64K at that point since you're dynamically allocating the buffers at run-time.
If you can give up a little speed, here is a C++ way that requires little code:
#include <fstream>
#include <iterator>
#include <string>
#include <algorithm>
bool compareFiles(const std::string& p1, const std::string& p2) {
std::ifstream f1(p1, std::ifstream::binary|std::ifstream::ate);
std::ifstream f2(p2, std::ifstream::binary|std::ifstream::ate);
if (f1.fail() || f2.fail()) {
return false; //file problem
}
if (f1.tellg() != f2.tellg()) {
return false; //size mismatch
}
//seek back to beginning and use std::equal to compare contents
f1.seekg(0, std::ifstream::beg);
f2.seekg(0, std::ifstream::beg);
return std::equal(std::istreambuf_iterator<char>(f1.rdbuf()),
std::istreambuf_iterator<char>(),
std::istreambuf_iterator<char>(f2.rdbuf()));
}
By using istreambuf_iterators
you push the buffer size choice, actual reading, and tracking of eof into the standard library implementation. std::equal
returns when it hits the first mismatch, so this should not run any longer than it needs to.
This is slower than Linux's cmp
, but it's very easy to read.
When the files are binary, use memcmp not strcmp as \0 might appear as data.
Here's a C++ solution. It seems appropriate since your question is tagged as C++
. The program uses ifstream
's rather than FILE*
's. It also shows you how to seek on a file stream to determine a file's size. Finally, it reads blocks of 4096 at a time, so large files will be processed as expected.
// g++ -Wall -Wextra equifile.cpp -o equifile.exe
#include <iostream>
using std::cout;
using std::cerr;
using std::endl;
#include <fstream>
using std::ios;
using std::ifstream;
#include <exception>
using std::exception;
#include <cstring>
#include <cstdlib>
using std::exit;
using std::memcmp;
bool equalFiles(ifstream& in1, ifstream& in2);
int main(int argc, char* argv[])
{
if(argc != 3)
{
cerr << "Usage: equifile.exe <file1> <file2>" << endl;
exit(-1);
}
try {
ifstream in1(argv[1], ios::binary);
ifstream in2(argv[2], ios::binary);
if(equalFiles(in1, in2)) {
cout << "Files are equal" << endl;
exit(0);
}
else
{
cout << "Files are not equal" << endl;
exit(1);
}
} catch (const exception& ex) {
cerr << ex.what() << endl;
exit(-2);
}
return -3;
}
bool equalFiles(ifstream& in1, ifstream& in2)
{
ifstream::pos_type size1, size2;
size1 = in1.seekg(0, ifstream::end).tellg();
in1.seekg(0, ifstream::beg);
size2 = in2.seekg(0, ifstream::end).tellg();
in2.seekg(0, ifstream::beg);
if(size1 != size2)
return false;
static const size_t BLOCKSIZE = 4096;
size_t remaining = size1;
while(remaining)
{
char buffer1[BLOCKSIZE], buffer2[BLOCKSIZE];
size_t size = std::min(BLOCKSIZE, remaining);
in1.read(buffer1, size);
in2.read(buffer2, size);
if(0 != memcmp(buffer1, buffer2, size))
return false;
remaining -= size;
}
return true;
}