Disclaim: This is micro-benchmark, please do not comment quotes such as \"premature optimization is evil\" if you feel unhappy about the topic.
Examples are release targ
This isn't a complete answer, but hope it helps you to go further.
I can reproduce the behaviour using the same configuration. Here is a simpler example for profiling:
open System
let test1() =
let ret = Array.zeroCreate 100
let pool = {1 .. 1000000}
for x in pool do
for _ in 1..50 do
for y in 1..200 do
ret.[2] <- x + y
let test2() =
let ret = Array.zeroCreate 100
let pool = {1 .. 1000000}
Seq.iter (fun x ->
for _ in 1..50 do
for y in 1..200 do
ret.[2] <- x + y) pool
let time f =
let sw = new Diagnostics.Stopwatch()
sw.Start()
let result = f()
sw.Stop()
Console.WriteLine(sw.Elapsed)
result
[]
let main argv =
time test1
time test2
0
In this example, Seq.iter
and for x in pool
is executed once but there is still 2x time difference between test1
and test2
:
00:00:06.9264843
00:00:03.6834886
Their ILs are very similar, so compiler optimization isn't a problem. It seems that x64 jitter fails to optimize test1
though it is able to do so with test2
. Interestingly, if I refactor nested for loops in test1
as a function, JIT optimization succeeds again:
let body (ret: _ []) x =
for _ in 1..50 do
for y in 1..200 do
ret.[2] <- x + y
let test3() =
let ret = Array.zeroCreate 100
let pool = {1..1000000}
for x in pool do
body ret x
// 00:00:03.7012302
When I disable JIT optimization using the technique described here, execution times of these functions are comparable.
Why x64 jitter fails in the particular example, I don't know. You can disassemble optimized jitted code to compare ASM instructions line by line. Maybe someone with good ASM knowledge can find out their differences.