What is the easiest way to check (in a unit test) whether binary files A and B are equal?
I had to do the same in a unit test too, so I used SHA1 hashes to do that, to spare the the calculation of the hashes I check if the files sizes are equal first. Here was my attempt:
public class SHA1Compare {
private static final int CHUNK_SIZE = 4096;
public void assertEqualsSHA1(String expectedPath, String actualPath) throws IOException, NoSuchAlgorithmException {
File expectedFile = new File(expectedPath);
File actualFile = new File(actualPath);
Assert.assertEquals(expectedFile.length(), actualFile.length());
try (FileInputStream fisExpected = new FileInputStream(actualFile);
FileInputStream fisActual = new FileInputStream(expectedFile)) {
Assert.assertEquals(makeMessageDigest(fisExpected),
makeMessageDigest(fisActual));
}
}
public String makeMessageDigest(InputStream is) throws NoSuchAlgorithmException, IOException {
byte[] data = new byte[CHUNK_SIZE];
MessageDigest md = MessageDigest.getInstance("SHA1");
int bytesRead = 0;
while(-1 != (bytesRead = is.read(data, 0, CHUNK_SIZE))) {
md.update(data, 0, bytesRead);
}
return toHexString(md.digest());
}
private String toHexString(byte[] digest) {
StringBuilder sha1HexString = new StringBuilder();
for(int i = 0; i < digest.length; i++) {
sha1HexString.append(String.format("%1$02x", Byte.valueOf(digest[i])));
}
return sha1HexString.toString();
}
}
Are third-party libraries fair game? Guava has Files.equal(File, File). There's no real reason to bother with hashing if you don't have to; it can only be less efficient.
With assertBinaryEquals.
public static void assertBinaryEquals(java.io.File expected,
java.io.File actual)
http://junit-addons.sourceforge.net/junitx/framework/FileAssert.html
If you want to avoid dependencies you can do it using quite nicely with Files.readAllBytes and Assert.assertArrayEquals
Assert.assertArrayEquals("Binary files differ",
Files.readAllBytes(Paths.get(expectedBinaryFile)),
Files.readAllBytes(Paths.get(actualBinaryFile)));
Note: This will read the whole file so it might not be efficient with large files.
Since Java 12 you could also use the Files.mismatch
method JavaDoc. It will return -1L
if the files are the same.
There's always just reading byte by byte from each file and comparing them as you go. Md5 and Sha1 etc still have to read all the bytes so computing the hash is extra work that you don't have to do.
if(file1.length() != file2.length()){
return false;
}
try( InputStream in1 =new BufferedInputStream(new FileInputStream(file1));
InputStream in2 =new BufferedInputStream(new FileInputStream(file2));
){
int value1,value2;
do{
//since we're buffered read() isn't expensive
value1 = in1.read();
value2 = in2.read();
if(value1 !=value2){
return false;
}
}while(value1 >=0);
//since we already checked that the file sizes are equal
//if we're here we reached the end of both files without a mismatch
return true;
}