Using diff to find the portions of many files that are the same? (bizzaro-diff, or inverse-diff)

依然范特西╮ 提交于 2019-12-23 23:02:59

问题


Bizzaro-Diff!!!

Is there a away to do a bizzaro/inverse-diff that only displays the portions of a group of files that are the same? (I.E. way more than three files)

Odd question, I know...but I'm converting someone's ancient static pages to something a little more manageable.


回答1:


You want a clone detector. It detects similar code chunks across large source systems. See our ClonedR tool: http://www.semdesigns.com/Products/Clone/index.html




回答2:


You could try the comm command (for common). It'll only compare 2 files at a time, but you should be able to do 3+ with some clever scripting.




回答3:


You could try sim. Been a few years since I've used it, but I recall it being very useful when looking for similarities within a file or in many different files.




回答4:


This is a classic problem.

If I had to quick-and-dirty it, I'd probably do something like a diff -U 1000000 (assuming a version of diff that supports it), piped through sed to just get the lines in common (and strip the leading spaces). You'd have to loop through all the files, though.

Edit: I forgot there is also Tcl implementation that would be slightly more versatile, but would require more coding. You may be able to find an implementation for your language of choice.



来源:https://stackoverflow.com/questions/522221/using-diff-to-find-the-portions-of-many-files-that-are-the-same-bizzaro-diff

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!