I don\'t care what the differences are. I just want to know whether the contents are different.
Since I can't comment on the answers of others I'll write my own.
If you use md5 you definitely must not just md5.update(f.read()) since you'll use too much memory.
def get_file_md5(f, chunk_size=8192):
h = hashlib.md5()
while True:
chunk = f.read(chunk_size)
if not chunk:
break
h.update(chunk)
return h.hexdigest()
I would use a hash of the file's contents using MD5.
import hashlib
def checksum(f):
md5 = hashlib.md5()
md5.update(open(f).read())
return md5.hexdigest()
def is_contents_same(f1, f2):
return checksum(f1) == checksum(f2)
if not is_contents_same('foo.txt', 'bar.txt'):
print 'The contents are not the same!'