Why are compiled Java class files smaller than C compiled files?

后端 未结 9 1433
被撕碎了的回忆
被撕碎了的回忆 2020-12-16 17:16

I would like to know why the .o file that we get from compiling a .c file that prints \"Hello, World!\" is larger than a Java .class file that also prints \"Hello, World!\"?

相关标签:
9条回答
  • 2020-12-16 17:55

    C programs, even though they're compiled to native machine code that runs on your processor (dispatched through the OS, of course), tend to need to do a lot of set up and tearing down for the operating system, loading dynamically-linked libraries like the C library, etc.

    Java, on the other hand, compiles to bytecode for a virtual platform (basically a simulated computer-within-a-computer), which is specifically designed alongside Java itself, so a lot of this overhead (if it would even be necessary since both the code and the VM interface is well-defined) can be moved into the VM itself, leaving the program code to be lean.

    It varies from compiler-to-compiler, though, and there are several options to reduce it or build code differently, which will have different effects.

    All this said, it's not really that important.

    0 讨论(0)
  • 2020-12-16 17:56

    Java uses Bytecode to be platform independent and "precompiled", but bytecode is used by interpreter and is served to be compact enough, so it is not the same that machine code which you can see in compiled C program. Just take a look at the full process of Java compilation:

    Java program  
    -> Bytecode   
      -> High-level Intermediate Representation (HIR)   
        -> Middle-level Intermediate Representation (MIR)   
          -> Low-level Intermediate Representation (LIR)  
            -> Register allocation
              -> EMIT (Machine Code)
    

    this is the chain for Java Program to Machine code transformation. As you see bytecode is far away from machine code. I can't find in the Internet good stuff to show you this road on the real program (an example), everything I've found is this presentation, here you can see how each steps changes code presentation. I hope it answers you how and why compiled c program and Java bytecode are different.

    UPDATE: All steps which are after "bytecode" are done by JVM in runtime depending on its decision to compile that code (that's another story... JVM is balancing between bytecode interpretation and its compiling to native platform dependent code)

    Finally found good example, taken from Linear Scan Register Allocation for the Java HotSpot™ Client Compiler (btw good reading to understand what is going on inside JVM). Imagine that we have java program:

    public static void fibonacci() {
      int lo = 0;
      int hi = 1;
      while (hi < 10000) {
        hi = hi + lo;
        lo = hi - lo;
        print(lo);
      }
    }
    

    then its bytecode is:

    0:  iconst_0
    1:  istore_0 // lo = 0
    2:  iconst_1
    3:  istore_1 // hi = 1
    4:  iload_1
    5:  sipush 10000
    8:  if_icmpge 26 // while (hi < 10000)
    11: iload_1
    12: iload_0
    13: iadd
    14: istore_1 // hi = hi + lo
    15: iload_1
    16: iload_0
    17: isub
    18: istore_0 // lo = hi - lo
    19: iload_0
    20: invokestatic #12 // print(lo)
    23: goto 4 // end of while-loop
    26: return
    

    each command takes 1 byte (JVM supports 256 commands, but in fact has less than that number) + arguments. Together it takes 27 bytes. I omit all stages, and here is ready to execute machine code:

    00000000: mov dword ptr [esp-3000h], eax
    00000007: push ebp
    00000008: mov ebp, esp
    0000000a: sub esp, 18h
    0000000d: mov esi, 1h
    00000012: mov edi, 0h
    00000017: nop
    00000018: cmp esi, 2710h
    0000001e: jge 00000049
    00000024: add esi, edi
    00000026: mov ebx, esi
    00000028: sub ebx, edi
    0000002a: mov dword ptr [esp], ebx
    0000002d: mov dword ptr [ebp-8h], ebx
    00000030: mov dword ptr [ebp-4h], esi
    00000033: call 00a50d40
    00000038: mov esi, dword ptr [ebp-4h]
    0000003b: mov edi, dword ptr [ebp-8h]
    0000003e: test dword ptr [370000h], eax
    00000044: jmp 00000018
    00000049: mov esp, ebp
    0000004b: pop ebp
    0000004c: test dword ptr [370000h], eax
    00000052: ret
    

    it takes 83 (52 in hex + 1 byte) bytes in result.

    PS. I don't take into account linking (was mentioned by others), as well as compiledc and bytecode file headers (probably they are different too; I don't know how is it with c, but in bytecode file all strings are moved to special header pool, and in program there is used its "position" in header etc.)

    UPDATE2: Probably worth to mention, that java works with stack (istore/iload commands), though machine code based on x86 and most other platform works with registers. As you can see machine code is "full" of registers and that gives extra size to the compiled program in comparing with more simple stack-based bytecode.

    0 讨论(0)
  • 2020-12-16 18:05

    In short: Java programs are compiled to Java byte code, which requires a separate interpreter (Java Virtual Machine) to be executed.

    There is not a 100% guarantee that the .o file produced by the c-compiler is smaller, than the .class file produced by the Java compiler. It all depends of the implementation of the compiler.

    0 讨论(0)
提交回复
热议问题