Compare checksum of files between two servers and report mismatch

微笑、不失礼 提交于 2019-12-21 12:16:07

问题


I have to compare checksum of all files in /primary and /secondary folders in machineA with files in this folder /bat/snap/ which is in remote server machineB. The remote server will have lots of files along with the files we have in machineA.

  • If there is any mismatch in checksum then I want to report all those files that have issues in machineA with full path and exit with non zero status code.
  • If everything is matching then exit zero.

I wrote one command (not sure whether there is any better way to write it) that I am running on machineA but its very slow. Is there any way to make it faster?

(cd /primary && find . -type f -exec md5sum {} +; cd /secondary && find . -type f -exec md5sum {} +) | ssh machineB '(cd /bat/snap/ && md5sum -c)'

Also it prints out file name like this ./abc_monthly_1536_proc_7.data: OK. Is there any way by which it can print out full path name of that file on machineA?

ssh to remote host for every file definitely isn't very efficient. parallel could speed it up by doing it concurrently for more files, but the more efficient way is likely to tweak the command a bit so it does ssh to machineB and gets all the md5sum in one shot. Is this possible to do?


回答1:


If your primary goal is not to count the checksums but list differences, perhaps faster (and easier) way would be to run rsync with --dry-run option. If any files listed, they differs, for example:

MBP:~ jhartman$ rsync -avr --dry-run rsync-test 192.168.1.100:/tmp/; echo $?
building file list ... done
rsync-test/file1.txt

sent 172 bytes  received 26 bytes  396.00 bytes/sec
total size is 90  speedup is 0.45

Of course, because of --dry-run no files changed on the target.

I hope it will help, Jarek




回答2:


If the files are in the directory /primary and /secondary instead of under these directories, lose the find.You may also wish to parallelize the md5-calculation. So that would make it:

#!/bin/bash
cd /primary
md5sum * > /tmp/file-p &
cd /secondary
md5sum * > /tmp/file-s &
wait
cat  /tmp/file-p /tmp/file-s | ssh machineB '(cd /bat/snap/ && md5sum -c)'

With a relatively small set of files:

$ time find . -exec md5sum {} \;
7e74a9f865a91c5b56b5cab9709f1f36  ./file
631f01c98ff2016971fb1ea22be3c2cf  ./hosts
d41d8cd98f00b204e9800998ecf8427e  ./fortune8547
49d05af711e2d473f12375d720fb0a92  ./vboxdrv-Module.symvers
bf4b1d740f7151dea0f42f5e9e2b0c34  ./tmpavG1pB
a9b0d3af1b80a46b92dfe1ce56b2e85c  ./in.clean.4524

real    0m0.046s
user    0m0.035s
sys 0m0.006s
$ time md5sum *
7e74a9f865a91c5b56b5cab9709f1f36  file
d41d8cd98f00b204e9800998ecf8427e  fortune8547
631f01c98ff2016971fb1ea22be3c2cf  hosts
a9b0d3af1b80a46b92dfe1ce56b2e85c  in.clean.4524
bf4b1d740f7151dea0f42f5e9e2b0c34  tmpavG1pB
49d05af711e2d473f12375d720fb0a92  vboxdrv-Module.symvers

real    0m0.005s
user    0m0.003s
sys 0m0.002s

(just to prove that find is not always the quickest).




回答3:


Using md5sum you can ask it to check files against an input md5sum file.

man md5sum: the following two options are useful:

  • -c, --check: read MD5 sums from the FILEs and check them
  • --quiet : don't print OK for each successfully verified file

So all we need to do is build such a file and pass it on. The easiest is the following (from machineA) :

$ cd /primary; md5sum * | ssh machineB '(cd /bat/snap; md5sum -c - --quiet 2>/dev/null)`
$ cd /secondary; md5sum * | ssh machineB '(cd /bat/snap; md5sum -c - --quiet 2>/dev/null)`

This will report things as :

file1: FAILED
file2: FAILED open or read

This will give you all the failed files per directory. You can do any post processing later on with any flavour of awk.




回答4:


You can try to parallelize the process mentioned in the other answer. change the + to a \;, execute bash with &.

find $(pwd) -type f -exec bash -c "md5sum '{}' &" \; 


来源:https://stackoverflow.com/questions/50070866/compare-checksum-of-files-between-two-servers-and-report-mismatch

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!