问题
the scenario is: I have generated md5 checksum for the pdf file which stored on the server using the following code:
def createMd5Hash(self, file_path, pdf_title, pdf_author):
md5_returned = None
try:
md5 = hashlib.md5()
with open(file_path, 'rb') as file_to_check:
for chunk in file_to_check:
md5.update(chunk)
md5_file = md5.hexdigest()
custom_key = 'xyzkey-{}'.format(md5_file)
md5.update(custom_key.encode())
md5_returned = md5.hexdigest()
except Exception as e:
print("Error while calculate md5: {}".format(e))
# code to add Hash value in metadata
try:
file = open(file_path, 'rb+')
reader = PdfFileReader(file)
writer = PdfFileWriter()
writer.appendPagesFromReader(reader)
metadata = reader.getDocumentInfo()
writer.addMetadata(metadata)
writer.addMetadata({
'/Author': pdf_author,
'/Title': pdf_title,
'/HashKey': md5_returned,
})
writer.write(file)
file.close()
except Exception:
print("Error while editing metadata")
example: HashKey = 02c85672c041c8c762474799690ad1a5
In the second part, I have added the metadata, including the hash value in the pdf. Now clients can download this file from my server (can be modified by them or not). When I received the file from any of my clients, I want to check the integrity of that file. Weather file data is modified or not. So for that, I wrote a utility that uploads the file and extract the metadata of the pdf file where I can get HashKey value which is the original hash value of the file when it was generating. When I try to decode the file using the following function, I suppose to get the same hash value if it's just downloaded from the server and not modified by my client.
def validateMd5Hash(file_path, current_md5):
md5_returned = None
try:
md5 = hashlib.md5()
with open(file_path, 'rb') as file_to_check:
# read contents of the file
for chunk in file_to_check:
md5.update(chunk)
# pipe contents of the file through
md5_file = md5.hexdigest()
private_key = 'xyzkey-{}'.format(md5_file)
md5.update(private_key.encode())
md5_returned = md5.hexdigest()
if md5_returned != current_md5:
return True
return False
except Exception as e:
print("Error while calculate md5: {}".format(e))
but the result is different. md5_returned = fbf79424a68892887379108a05968437 current_md5 = 02c85672c041c8c762474799690ad1a5
When I use the above function, it generates different hash value then what it has during creation. I guess the reason would be when the client downloads the file, created date and modified date change which is the reason I got new md5 checksum other than what is inside the HashKey. I am looking to generate a hash value for only the content of the file without includes metadata. Can anyone help me, please? Sorry for bad English
来源:https://stackoverflow.com/questions/64566153/how-do-i-calculate-the-md5-checksum-of-a-file-contents-in-python