For simplicity imagine this scenario, we have a 2-bit computer, which has a pair of 2 bit registers called r1 and r2 and only works with immediate addressing.
Lets
Simplifying, interpreter is a infinite loop with a giant switch inside. It reads Java byte code (or some internal representation) and emulates a CPU executing it. This way the real CPU executes the interpreter code, which emulates the virtual CPU. This is painfully slow. Single virtual instruction adding two numbers requires three function calls and many other operations. Single virtual instruction takes a couple of real instructions to execute. This is also less memory efficient as you have both real and emulated stack, registers and instruction pointers.
while(true) {
Operation op = methodByteCode.get(instructionPointer);
switch(op) {
case ADD:
stack.pushInt(stack.popInt() + stack.popInt())
instructionPointer++;
break;
case STORE:
memory.set(stack.popInt(), stack.popInt())
instructionPointer++;
break;
...
}
}
When some method is interpreted multiple times, JIT compiler kicks in. It will read all virtual instructions and generate one or more native instructions which does the same. Here I'm generating string with text assembly which would require additional assembly to native binary conversions.
for(Operation op : methodByteCode) {
switch(op) {
case ADD:
compiledCode += "popi r1"
compiledCode += "popi r2"
compiledCode += "addi r1, r2, r3"
compiledCode += "pushi r3"
break;
case STORE:
compiledCode += "popi r1"
compiledCode += "storei r1"
break;
...
}
}
After native code is generated, JVM will copy it somewhere, mark this region as executable and instruct the interpreter to invoke it instead of interpreting byte code next time this method is invoked. Single virtual instruction might still take more than one native instruction but this will be nearly as fast as ahead of time compilation to native code (like in C or C++). Compilation is usually much slower than interpreting, but has to be done only once and only for chosen methods.
One of the important steps in Java is that the compiler first translates the .java
code into a .class
file, which contains the Java bytecode. This is useful, as you can take .class
files and run them on any machine that understands this intermediate language, by then translating it on the spot line-by-line, or chunk-by-chunk. This is one of the most important functions of the java compiler + interpreter. You can directly compile Java source code to native binary, but this negates the idea of writing the original code once and being able to run it anywhere. This is because the compiled native binary code will only run on the same hardware/OS architecture that it was compiled for. If you want to run it on another architecture, you'd have to recompile the source on that one. With the compilation to the intermediate-level bytecode, you don't need to drag around the source code, but the bytecode. It's a different issue, as you now need a JVM that can interpret and run the bytecode. As such, compiling to the intermediate-level bytecode, which the interpreter then runs, is an integral part of the process.
As for the actual realtime running of code: yes, the JVM will eventually interpret/run some binary code that may or may not be identical to natively compiled code. And in a one-line example, they may seem superficially the same. But the interpret typically doesn't precompile everything, but goes through the bytecode and translates to binary line-by-line or chunk-by-chunk. There are pros and cons to this (compared to natively compiled code, e.g. C and C compilers) and lots of resources online to read up further on. See my answer here, or this, or this one.
Not all computers have the same instruction set. Java bytecode is a kind of Esperanto - an artificial language to improve communication. The Java VM translates the universal Java bytecode to the instruction set of the computer it runs on.
So how does JIT figure in here? The main purpose of the JIT compiler is optimization. There are often different ways to translate a certain piece of bytecode into the target machine code. The most performance-ideal translation is often non-obvious because it might depend on the data. There are also limits to how far a program can analyze an algorithm without executing it - the halting problem is a well-known such limitation but not the only one. So what the JIT compiler does is try different possible translations and measure how fast they are executed with the real-world data the program processes. So it takes a number of executions until the JIT compiler found the perfect translation.
The CPU architecture you describe is unfortunately too restricted to make this really clear with all the intermediate steps. Instead, I will write pseudo-C and pseudo-x86-assembler, hopefully in a way that is clear without being terribly familiar with C or x86.
The compiled JVM bytecode might look something like this:
ldc 0 # push first first constant (== 1)
ldc 1 # push the second constant (== 2)
iadd # pop two integers and push their sum
istore_0 # pop result and store in local variable
The interpreter has (a binary encoding of) these instructions in an array, and an index referring to the current instruction. It also has an array of constants, and a memory region used as stack and one for local variables. Then the interpreter loop looks like this:
while (true) {
switch(instructions[pc]) {
case LDC:
sp += 1; // make space for constant
stack[sp] = constants[instructions[pc+1]];
pc += 2; // two-byte instruction
case IADD:
stack[sp-1] += stack[sp]; // add to first operand
sp -= 1; // pop other operand
pc += 1; // one-byte instruction
case ISTORE_0:
locals[0] = stack[sp];
sp -= 1; // pop
pc += 1; // one-byte instruction
// ... other cases ...
}
}
This C code is compiled into machine code and run. As you can see, it's highly dynamic: It inspects each bytecode instruction each time that instruction is executed, and all values goes through the stack (i.e. RAM).
While the actual addition itself probably happens in a register, the code surrounding the addition is rather different from what a Java-to-machine code compiler would emit. Here's an excerpt from what a C compiler might turn the above into (pseudo-x86):
.ldc:
incl %esi # increment the variable pc, first half of pc += 2;
movb %ecx, program(%esi) # load byte after instruction
movl %eax, constants(,%ebx,4) # load constant from pool
incl %edi # increment sp
movl %eax, stack(,%edi,4) # write constant onto stack
incl %esi # other half of pc += 2
jmp .EndOfSwitch
.addi
movl %eax, stack(,%edi,4) # load first operand
decl %edi # sp -= 1;
addl stack(,%edi,4), %eax # add
incl %esi # pc += 1;
jmp .EndOfSwitch
You can see that the operands for the addition come from memory instead of being hardcoded, even though for the purposes of the Java program they are constant. That's because for the interpreter, they are not constant. The interpreter is compiled once and then must be able to execute all sorts of programs, without generating specialized code.
The purpose of the JIT compiler is to do just that: Generate specialized code. A JIT can analyze the ways the stack is used to transfer data, the actual values of various constants in the program, and the sequence of calculations performed, to generate code that more efficiently does the same thing. In our example program, it would allocate the local variable 0 to a register, replace the access to the constant table with moving constants into registers (movl %eax, $1
), and redirect the stack accesses to the right machine registers. Ignoring a few more optimizations (copy propagation, constant folding and dead code elimination) that would normally be done, it might end up with code like this:
movl %ebx, $1 # ldc 0
movl %ecx, $2 # ldc 1
movl %eax, %ebx # (1/2) addi
addl %eax, %ecx # (2/2) addi
# no istore_0, local variable 0 == %eax, so we're done