why Seq.iter is 2x faster than for loop if target is for x64?

后端 未结 2 1006
一生所求
一生所求 2021-02-20 17:00

Disclaim: This is micro-benchmark, please do not comment quotes such as \"premature optimization is evil\" if you feel unhappy about the topic.

Examples are release targ

2条回答
  •  佛祖请我去吃肉
    2021-02-20 17:56

    This isn't a complete answer, but hope it helps you to go further.

    I can reproduce the behaviour using the same configuration. Here is a simpler example for profiling:

    open System
    
    let test1() =
        let ret = Array.zeroCreate 100
        let pool = {1 .. 1000000}    
        for x in pool do
            for _ in 1..50 do
                for y in 1..200 do
                    ret.[2] <- x + y
    
    let test2() =
        let ret = Array.zeroCreate 100
        let pool = {1 .. 1000000}    
        Seq.iter (fun x -> 
            for _ in 1..50 do
                for y in 1..200 do
                    ret.[2] <- x + y) pool
    
    let time f =
        let sw = new Diagnostics.Stopwatch()
        sw.Start()
        let result = f() 
        sw.Stop()
        Console.WriteLine(sw.Elapsed)
        result
    
    []
    let main argv =
        time test1
        time test2
        0
    

    In this example, Seq.iter and for x in pool is executed once but there is still 2x time difference between test1 and test2:

    00:00:06.9264843
    00:00:03.6834886
    

    Their ILs are very similar, so compiler optimization isn't a problem. It seems that x64 jitter fails to optimize test1 though it is able to do so with test2. Interestingly, if I refactor nested for loops in test1 as a function, JIT optimization succeeds again:

    let body (ret: _ []) x =
        for _ in 1..50 do
            for y in 1..200 do
                ret.[2] <- x + y
    
    let test3() =
        let ret = Array.zeroCreate 100
        let pool = {1..1000000}    
        for x in pool do
            body ret x
    
    // 00:00:03.7012302
    

    When I disable JIT optimization using the technique described here, execution times of these functions are comparable.

    Why x64 jitter fails in the particular example, I don't know. You can disassemble optimized jitted code to compare ASM instructions line by line. Maybe someone with good ASM knowledge can find out their differences.

提交回复
热议问题