Why does JavaScript appear to be 4 times faster than C++?

前端 未结 5 1811
长发绾君心
长发绾君心 2021-01-29 22:42

For a long time, I had thought of C++ being faster than JavaScript. However, today I made a benchmark script to compare the speed of floating point calculations in the two langu

5条回答
  •  滥情空心
    2021-01-29 23:41

    Even if the post is old, I think it may be interesting to add some information. In summary, your test is too vague and may be biased.

    A bit about speed testing methodology

    When comparing speed of two languages, you first have to define precisely in which context you want to compare how they perform.

    • "naive" vs "optimized" code : whether or not code tested is made by a beginner or expert programmer. This parameter matter matter depending on who will participate in your project. For example, when working with scientists (non geeky ones), you will look more for "naive" code performance, because scientists aren't forcibly good programmers.

    • authorized compile time : whether you consider you allow the code to build for long or not. This parameter can matter depending on your project management methodology. If you need to do automated tests, maybe trading a bit of speed to increase compile time can be interesting. On the other hand, you can consider that distribution version is allowing a high amount of building time.

    • Platform portability : if your speed shall be compared on one platform or more (Windows, Linux, PS4...)

    • Compiler/interpreter portability : if your code's speed shall be compiler/interpreter independent or not. Can be useful for multiplatform and/or open source projects.

    • Other specialized parameters, as for example if you allow dynamic allocations in your code, if you want to enable plugins (dynamically loaded library at runtime) etc.

    Then, you have to make sure that your code is representative of what you want to test

    Here, (I assume you didn't compiled C++ with optimization flags), you are testing fast-compile speed of "naive" (not so naive actually) code. Because your loop is fixed size, with fixed data, you don't test dynamic allocations, and you -supposedly- allow code transformations (more on that in the next section). And effectively, JavaScript performs usually better than C++ in this case, because JavaScript optimizes at compile time by default, while C++ compilers needs to be told to optimize.

    A quick overview of C++ speed increase with parameters

    Because I am not knowledgeable enough about JavaScript, I'll only show how code optimization and compilation type can change c++ speed on a fixed for loop, hoping it will answer the question on "how JS can appear to be faster than C++ ?"

    For that let's use Matt Godbolt's C++ compiler explorer to see the assembly code generated by gcc9.2

    Non optimized code

    float func(){
        float a(0.0);
        float b(2.71);
        for (int i = 0;  i < 100000; ++i){
            a = a + b;
        }
        return a;
    }
    

    compiled with : gcc 9.2, flag -O0. Produces the following assembly code :

    func():
            pushq   %rbp
            movq    %rsp, %rbp
            pxor    %xmm0, %xmm0
            movss   %xmm0, -4(%rbp)
            movss   .LC1(%rip), %xmm0
            movss   %xmm0, -12(%rbp)
            movl    $0, -8(%rbp)
    .L3:
            cmpl    $99999, -8(%rbp)
            jg      .L2
            movss   -4(%rbp), %xmm0
            addss   -12(%rbp), %xmm0
            movss   %xmm0, -4(%rbp)
            addl    $1, -8(%rbp)
            jmp     .L3
    .L2:
            movss   -4(%rbp), %xmm0
            popq    %rbp
            ret
    .LC1:
            .long   1076719780
    
    

    The code for the loop is what is between ".L3" and ".L2". To be quick, we can see that the code created here is not optimized at all : a lot of memory access are made (no proper use of registers), and because of this there are a lot of wasted operations storing and reloading the result.

    This introduces an extra 5 or 6 cycles of store-forwarding latency into the critical path dependency chain of FP addition into a, on modern x86 CPUs. This is on top of the 4 or 5 cycle latency of addss, making the function more than twice as slow.

    compiler optimization

    The same C++ compiled with gcc 9.2, flag -O3. Produces the following assembly code:

    func():
            movss   .LC1(%rip), %xmm1
            movl    $100000, %eax
            pxor    %xmm0, %xmm0
    .L2:
            addss   %xmm1, %xmm0
            subl    $1, %eax
            jne     .L2
            ret
    .LC1:
            .long   1076719780
    

    The code is much more concise, and uses registers as much as possible.

    code optimization

    A compiler optimizes code very well usually, especially C++, given that the code is expressing clearly what the programmer wants to achieve. Here we want a fixed mathematical expression to be as fast a possible, so let's change the code a bit.

    constexpr float func(){
        float a(0.0);
        float b(2.71);
        for (int i = 0;  i < 100000; ++i){
            a = a + b;
        }
        return a;
    }
    
    float call() {
        return func();
    }
    

    We added a constexpr to the function to tell the compiler to try to compute it's result at compile time. And added a calling function to be sure that it will generate some code.

    Compiled with gcc 9.2, -O3, leads to following assembly code :

    call():
            movss   .LC0(%rip), %xmm0
            ret
    .LC0:
            .long   1216623031
    

    The asm code is short, since the value returned by func has been computed at compile time, and call simply returns it.


    Of course, a = b * 100000 would always compile to efficient asm, so only write the repeated-add loop if you need to explore FP rounding error over all those temporaries.

提交回复
热议问题