What makes this function run much slower?

后端 未结 3 1442
有刺的猬
有刺的猬 2021-01-30 04:58

I\'ve been trying to make an experiment to see if the local variables in functions are stored on a stack.

So I wrote a little performance test

function t         


        
相关标签:
3条回答
  • 2021-01-30 05:31

    Inlining the function to the test function makes it run always the same time.
    Is it possible that there is an optimization that inlines the function parameter if it's always the same function?

    Yes, this seems to be exactly what you are observing. As already mentioned by @Luaan, the compiler likely drops the bodies of your straight and inverse functions anyway because they are not having any side effects, but only manipulating some local variables.

    When you are calling test(…, 100000) for the first time, the optimising compiler realises after some iterations that the fn() being called is always the same, and does inline it, avoiding the costly function call. All that it does now is 10 million times decrementing a variable and testing it against 0.

    But when you are calling test with a different fn then, it has to de-optimise. It may later do some other optimisations again, but now knowing that there are two different functions to be called it cannot inline them any more.

    Since the only thing you're really measuring is the function call, that leads to the grave differences in your results.

    An experiment to see if the local variables in functions are stored on a stack

    Regarding your actual question, no, single variables are not stored on a stack (stack machine), but in registers (register machine). It doesn't matter in which order they are declared or used in your function.

    Yet, they are stored on the stack, as part of so called "stack frames". You'll have one frame per function call, storing the variables of its execution context. In your case, the stack might look like this:

    [straight: a, b, c, d, e]
    [test: fn, times, i, t]
    …
    
    0 讨论(0)
  • 2021-01-30 05:34

    Once you call test with two different functions fn() callsite inside it becomes megamorphic and V8 is unable to inline at it.

    Function calls (as opposed to method calls o.m(...)) in V8 are accompanied by one element inline cache instead of a true polymorphic inline cache.

    Because V8 is unable to inline at fn() callsite it is unable to apply a variety of optimizations to your code. If you look at your code in IRHydra (I uploaded compilation artifacts to gist for your convinience) you will notice that first optimized version of test (when it was specialized for fn = straight) has a completely empty main loop.

    V8 just inlined straight and removed all the code your hoped to benchmark with Dead Code Elimination optimization. On an older version of V8 instead of DCE V8 would just hoist the code out of the loop via LICM - because the code is completely loop invariant.

    When straight is not inlined V8 can't apply these optimizations - hence the performance difference. Newer version of V8 would still apply DCE to straight and inversed themselves turning them into empty functions

    so the performance difference is not that big (around 2-3x). Older V8 was not aggressive enough with DCE - and that would manifest in bigger difference between inlined and not-inlined cases, because peak performance of inlined case was solely result of aggressive loop-invariant code motion (LICM).

    On related note this shows why benchmarks should never be written like this - as their results are not of any use as you end up measuring an empty loop.

    If you are interested in polymorphism and its implications in V8 check out my post "What's up with monomorphism" (section "Not all caches are the same" talks about the caches associated with function calls). I also recommend reading through one of my talks about dangers of microbenchmarking, e.g. most recent "Benchmarking JS" talk from GOTO Chicago 2015 (video) - it might help you to avoid common pitfalls.

    0 讨论(0)
  • 2021-01-30 05:48

    You're misunderstanding the stack.

    While the "real" stack indeed only has the Push and Pop operations, this doesn't really apply for the kind of stack used for execution. Apart from Push and Pop, you can also access any variable at random, as long as you have its address. This means that the order of locals doesn't matter, even if the compiler doesn't reorder it for you. In pseudo-assembly, you seem to think that

    var x = 1;
    var y = 2;
    
    x = x + 1;
    y = y + 1;
    

    translates to something like

    push 1 ; x
    push 2 ; y
    
    ; get y and save it
    pop tmp
    ; get x and put it in the accumulator
    pop a
    ; add 1 to the accumulator
    add a, 1
    ; store the accumulator back in x
    push a
    ; restore y
    push tmp
    ; ... and add 1 to y
    

    In truth, the real code is more like this:

    push 1 ; x
    push 2 ; y
    
    add [bp], 1
    add [bp+4], 1
    

    If the thread stack really was a real, strict stack, this would be impossible, true. In that case, the order of operations and locals would matter much more than it does now. Instead, by allowing random access to values on the stack, you save a lot of work for both the compilers, and the CPU.

    To answer your actual question, I'm suspecting neither of the functions actually does anything. You're only ever modifying locals, and your functions aren't returning anything - it's perfectly legal for the compiler to completely drop the function bodies, and possibly even the function calls. If that's indeed so, whatever performance difference you're observing is probably just a measurement artifact, or something related to the inherent costs of calling a function / iterating.

    0 讨论(0)
提交回复
热议问题