why Seq.iter is 2x faster than for loop if target is for x64?

后端 未结 2 965
一生所求
一生所求 2021-02-20 17:00

Disclaim: This is micro-benchmark, please do not comment quotes such as \"premature optimization is evil\" if you feel unhappy about the topic.

Examples are release targ

相关标签:
2条回答
  • 2021-02-20 17:49

    When I run the experiment on my machine (using F# 3.0 in VS 2012 in Release mode), I do not get the times you describe. Do you consistently get the same numbers when you run it repeatedly?

    I tried it about 4 times and I always get numbers that are very similar. The version with Seq.iter tends to be slightly faster, but this is probably not statistically significant. Something like (using Stopwatch):

    test(1) = 15321ms
    test(2) = 5149ms
    test(3) = 14290ms
    test(4) = 4999ms
    

    I'm running the test on a laptop with Intel Core2 Duo (2.26Ghz), using 64bit Windows 7.

    0 讨论(0)
  • 2021-02-20 17:56

    This isn't a complete answer, but hope it helps you to go further.

    I can reproduce the behaviour using the same configuration. Here is a simpler example for profiling:

    open System
    
    let test1() =
        let ret = Array.zeroCreate 100
        let pool = {1 .. 1000000}    
        for x in pool do
            for _ in 1..50 do
                for y in 1..200 do
                    ret.[2] <- x + y
    
    let test2() =
        let ret = Array.zeroCreate 100
        let pool = {1 .. 1000000}    
        Seq.iter (fun x -> 
            for _ in 1..50 do
                for y in 1..200 do
                    ret.[2] <- x + y) pool
    
    let time f =
        let sw = new Diagnostics.Stopwatch()
        sw.Start()
        let result = f() 
        sw.Stop()
        Console.WriteLine(sw.Elapsed)
        result
    
    [<EntryPoint>]
    let main argv =
        time test1
        time test2
        0
    

    In this example, Seq.iter and for x in pool is executed once but there is still 2x time difference between test1 and test2:

    00:00:06.9264843
    00:00:03.6834886
    

    Their ILs are very similar, so compiler optimization isn't a problem. It seems that x64 jitter fails to optimize test1 though it is able to do so with test2. Interestingly, if I refactor nested for loops in test1 as a function, JIT optimization succeeds again:

    let body (ret: _ []) x =
        for _ in 1..50 do
            for y in 1..200 do
                ret.[2] <- x + y
    
    let test3() =
        let ret = Array.zeroCreate 100
        let pool = {1..1000000}    
        for x in pool do
            body ret x
    
    // 00:00:03.7012302
    

    When I disable JIT optimization using the technique described here, execution times of these functions are comparable.

    Why x64 jitter fails in the particular example, I don't know. You can disassemble optimized jitted code to compare ASM instructions line by line. Maybe someone with good ASM knowledge can find out their differences.

    0 讨论(0)
提交回复
热议问题