The legacy project I am working on includes some external library in a form of set of binary jar files. We decided that for analysis and potential patching, we want to recei
There are a variety of JAR comparison tools out there. One that used to be pretty good is Jardiff. I haven't used it in awhile but I'm sure it's still available. There are also some commercial offerings in the same space that could fit your needs.
Jardiff that Perception mentioned is a good start, however there is no way to do it 100% percent sure theoretically. This is because the same source can be compiled with different compilers and different compiler configurations and optimization levels. So there is no way to compare binary code (bytecode) beyond class and method signatures.
What do you mean by "similar implementation" of a method? Let's suppose that a clever compiler drops an else
case because it figures out that the condition may not be true ever. Are the two similar? Yes and no.. :-)
The best way to go IMHO is setting up very good regression test cases that check every key feature of your libraries. This might be a horror, but on long term might be cheaper than hunting for bugs. It all depends on your future plans in this project. Not a trivial easy decision.
For method signatures, use a tool like jardiff.
For similarity of implementation, you have to fall back to a wild guess. Comparing the bytecode on opcode-level may be compiler-dependent and lead to a large number of false negatives. If this is the case, you could fall back to compare the methods of a class using the LineNumberTable.
It gives you a list of line numbers for each method (as long as the class file has been compiled with the debug flag, which is often missing in very old or commercial libraries).
If two class files are compiled from the same source code, then at least the line numbers of each method should match exactly.
You can use a library such as Apache BCEL to retrieve the LineNumberTable:
// import org.apache.bcel.classfile.ClassParser;
JavaClass fooClazz = new ClassParser( "Foo.class" ).parse();
for( Method m : fooClazz.getMethods() )
{
LineNumberTable lnt = m.getLineNumberTable();
LineNumber[] tab = lnt.getLineNumberTable();
for( LineNumber ln : tab )
{
System.out.println( ln.getLineNumber() );
}
}
I suggest a multi-stage process:
Apply the previously suggested Jardiff or similar to see if there are any API differences. If possible, pick a tool that has an option for reporting private methods etc. In practice, any substantial implementation change in Java is likely to change some methods and classes, even if the public API is unchanged.
If you have an API match, compile a few randomly selected files with the indicated compiler, decompile the result and the original class files, and compare the results. If they match, apply the same process to larger and larger bodies of code until you either find a mismatch, or have checked everything.
Diffs of decompiled code are more likely to give you clues about the nature of the differences, and are easier to filter for non-significant differences, than the actual class files.
If you get a mismatch, analyze it. It may be due to something you do not care about. If so, try to construct a script that will delete that form of difference and resume the compile-and-compare process. If you get widespread mismatches, experiment with compiler parameters such as optimization. If adjustments to the compiler parameters eliminate the differences, continue with the bulk comparison. The objective in this phase is to find a combination of compiler parameters and decompiled code filters that produce a match on the sample files, and apply them to bulk comparison of the library.
If you cannot get a reasonably close match in the decompiled code, you probably do not have the right source code. Even so, if you have an API match it may be worth building your system and running your tests using the result of the compilation. If your tests run at least as well with the version you built from source, continue work using it.