I\'ve become more and more comfortable programming in Java than with C++ or C. I am hoping to get a sense of the performance hit incurred using a JVM interpreter, as oppose
A lot of people underestimate the performance of java. I was once curious about this as well and wrote a simple program in java and then an equivalent in c (not much more than doing some operation with a for loop and a massive array). I don't recall exact figures, but I do know that java beat out c when the c program was not compiled with any optimization flags (under gcc). As expected, c pulled ahead when I finally compiled it with aggressive optimization. To be honest, it wasn't a scientific experiment by any means, but it did give me a baseline of knowing just where java stood.
Of course, java probably falls further behind when you start doing things that require system calls. Though, I have seen 100MB/s read performance with disks and network with java programs running on modest hardware. Not sure what that says exactly, but it does indicate to me that it's good enough for pretty much anything I'll need it for.
As for threads, if your java program creates 2 threads, then you have 2 real threads.
There isn't an easy answer to this. Writing C style C++ is possible (even a good idea) but once you try to do inheritance in C, things get ugly. So ignore C and go with Java -vs- C++ since they are closer to one another.
To get a real sense of it you would need to write two relatively large applications in similar manner in both languages. If you do that then do you use the STL and the Java collection classes or do you write your own and port them between languages? If you use the native one then it depends on which implementation is faster where as if you use your own you are not testing the real speed of the application.
I'd say you would need to write the application as similar as possible but use the language specific libraries/idioms where it makes sense. C++ and Java code, while being similar, have different ways of doing things - something that is easy in Java may be terribly hard in C++ and vice versa.
A modern GC implementation doesn't add that much overhead, and you can switch to a GC in C++ to do the comparison if you like :-)
There are some things that the Java runtime can do that is not generally done in C++ compilers, such as the ability to inline virtual methods.
For system type things Java typically resorts to making calls into C so there is overhead there (though JNI is faster than it used to be).
Threading depends on the implementation. Sun used to use "green threads; for Solaris, but that is long gone. As far as I know most (all?) modern VMs use native Threads.
In short I don't think there is a good metric on the % overhead for Java -vs- C++, and any that you find are likely to be micro benchmarks that do not represent the real world (unfortunately).
To address each of your points:
As your objective is very modest "I am hoping to get a sense of the performance hit..." you should be able to fulfill most of it by examining the programs and measurements shown in the Computer Language Benchmarks Game.
As you know both Java and C++
you can look at the program source code and decide for yourself which of the Java programs are reasonable to compare with which of the C and C++ programs
you can look at the dozen different tasks and decide for yourself which of them exercise your idea of "the most basic features of each language"
you can look at the different approaches to multicore, or programs forced onto one core
you can check how much JVM startup might or might not effect those measurements
But you do have to think about whether measurements of tiny programs can plausibly indicate the likely performance of your application.
Java isn't an interpreted language, and hasn't been for several versions. The Java bytecode is JIT'ed on the fly. (Technically it still interprets some of the code, but anything that matters performance-wise gets JIT'ed)
As for performance, what on Earth gives you the crazy idea that "there is a baseline for overhead"? There isn't. There never was and never will be. Not betwee C++ and Java, and not between Python and Javascript, or any other two languages. There are things that your specific version of the JVM will do faster than your specific C++ compiler, and things that your specific C++ compiler will do better than your specific JVM.
So the "overhead" of your choice of language depends entirely on 1) what you want your code to do, and 2) how you write your code.
If you take a Java program and translate it to C++, the result will almost certainly run slower.
If you take a C++ program and translate it to Java, that too will also run slower.
Not because one language is "faster" than the other, but because the original program was written for one language, and was tailored to work well in that language. And any attempt to translate it to another language will lose this advantage. You end up with a C++-style Java program, which won't run efficiently on the JVM, or a Java-style C++ program, which will run terribly as well.
Neither language specification contains a clause that "and the result must be at least x% slower than language y". Both your C++ compiler and the JVM do their very best to make things go fast.
And then performance characteristics you're seeing today may change tomorrow. Languages don't have a speed.
But to answer your specific questions:
There must be some baseline for overhead when using an interpreter. Is there some general rule of thumb to remember? 10% 15%? I have read the occasional blog stating that Java code is nearly as fast as native code, but I that may have been biased.
As said above, it depends. For many common tasks, you typically won't see more than a few percents difference either way. For some use cases, you'll see a larger difference (going either way. Both languages have advantages when it comes to performance. There is some overhead associated with the JVM, but there are also huge optimization opportunities and not least the garbage collector)
Does the JVM garbage collector add significant overhead to runtime performance? I know Cocoa applications have begun to use a garbage collection model, and i agree that it makes programming a lot simpler, but at what cost?
Basically none. On average, a garbage collector is far faster than manual memory management, for many reasons:
The main problem with a GC is that while on average a garbage collector performs better, you lose some control over when to take the performance cost. Manual memory management ensures your thread won't ever be halted while waiting for memory to be cleaned up. A garbage collector can, at almost any time, decide to pause the process and clean up memory. In almost all cases, this is fast enough to be no problem, but for vital real-time stuff, it is a problem.
(An additional problem is that you lose a bit of expressiveness. In C++, RAII is used to manage all sorts of resources. In Java, you can't use RAII. Instead the GC handles memory for you, and for all other resources, you're screwed, and have to do it yourself with lots of try/finally blocks. There is no reason why RAII couldn't be implemented in a GC'ed language, but it's not available in either Java or C#)
What is the overhead of making system calls from Java? For example creating a Socket object as opposed to the C socket API.
Roughly the same. Why would it be different? Of course, Java has to invoke the relevant OS services and APIs, so there is a tiny bit of overhead, but it is really nothing you're likely to notice.
Finally, I recall reading somewhere that the JVM implementation is single threaded. If this is true (which i am skeptical about), does that mean that Java threads really aren't true threads? Does a java thread, in general, correspond to an underlying kernel-provided thread? Does a Java application benefit in the same way a native application would from multiple cores / multiple cpu's?
Java can use multiple threads, yes. The JVM itself might be singlethreaded (in the sense that all the JVM services run on the same thread), I don't know about that. But your Java application can use as many threads as it likes, and they are mapped to OS threads and will use multiple cores.
http://www.w3sys.com/pages.meta/benchmarks.html
http://www.freewebs.com/godaves/javabench_revisited/
http://en.wikipedia.org/wiki/Comparison_of_Java_and_C%2B%2B#Performance
http://blog.dhananjaynene.com/2008/07/performance-comparison-c-java-python-ruby-jython-jruby-groovy/
http://www.irrlicht3d.org/pivot/entry.php?id=446
And so on. The fact is - it doesn't matter. Bottlenecks and slow software are created by the developers, not by the language (at least nowadays).