How can I implement a diff
function, such as Stack Overflow\'s question revision history?
You have here a javascript example of the implementation of a diff algorithm.
Based on:
P. Heckel, A technique for isolating differences between files Comm. ACM, 21, (4), 264--268 (1978).
The implementation, itself, has two functions, one of which is recommended for use:
diffString( String oldFile, String newFile )
This method takes two strings and calculates the differences in each. The final result is the 'newFile' marked up with HTML (to signify both deletions from the oldFile and additions to the newFile).
I guess the only way would be to compare each character forming the 2 strings . Something like this :
void diff(String first,String second) {
int biggest = (first.length() > second.length()) ? first.length() : second.length();
for(int i = 0;i < biggest;i++) {
//compare each char from the longest string with each char from the shorter
// do something with them if they're not equal
}
}
This is just a sketch of how I would do it . Everything depends on what you want to do with the data .
If what you want is revision history, don't reinvent the wheel starting at diff. Just throw everything into version control and use its diff and logging facilities. For simple, linear history something as simple as RCS will do. Or you can throw the latest cannon at it and use git.
Most diff utilities do a line-by-line diff. Stack overflow does a word-by-word diff. For that something like wdiff is necessary. Most version control systems let you plug in the diff utility. Out of the box, git diff --color-words
comes remarkably close to what is done here. With a little fiddling with the settings you can probably get it to spit out something you can then make into a pretty web page.
I would find the code for the FreeBSD diff utility and use that as the baseline. There's no point in re-inventing wheels when the licence allows for this sort of copying.
Most algorithms are based on LCS: Longest common subsequence. It isn't obvious to implement it in an efficient way. You will probably find various implementations on the Net, for various languages.