Disclaim: This is micro-benchmark, please do not comment quotes such as \"premature optimization is evil\" if you feel unhappy about the topic.
Examples are release targ
When I run the experiment on my machine (using F# 3.0 in VS 2012 in Release mode), I do not get the times you describe. Do you consistently get the same numbers when you run it repeatedly?
I tried it about 4 times and I always get numbers that are very similar. The version with Seq.iter
tends to be slightly faster, but this is probably not statistically significant. Something like (using Stopwatch
):
test(1) = 15321ms
test(2) = 5149ms
test(3) = 14290ms
test(4) = 4999ms
I'm running the test on a laptop with Intel Core2 Duo (2.26Ghz), using 64bit Windows 7.
This isn't a complete answer, but hope it helps you to go further.
I can reproduce the behaviour using the same configuration. Here is a simpler example for profiling:
open System
let test1() =
let ret = Array.zeroCreate 100
let pool = {1 .. 1000000}
for x in pool do
for _ in 1..50 do
for y in 1..200 do
ret.[2] <- x + y
let test2() =
let ret = Array.zeroCreate 100
let pool = {1 .. 1000000}
Seq.iter (fun x ->
for _ in 1..50 do
for y in 1..200 do
ret.[2] <- x + y) pool
let time f =
let sw = new Diagnostics.Stopwatch()
sw.Start()
let result = f()
sw.Stop()
Console.WriteLine(sw.Elapsed)
result
[<EntryPoint>]
let main argv =
time test1
time test2
0
In this example, Seq.iter
and for x in pool
is executed once but there is still 2x time difference between test1
and test2
:
00:00:06.9264843
00:00:03.6834886
Their ILs are very similar, so compiler optimization isn't a problem. It seems that x64 jitter fails to optimize test1
though it is able to do so with test2
. Interestingly, if I refactor nested for loops in test1
as a function, JIT optimization succeeds again:
let body (ret: _ []) x =
for _ in 1..50 do
for y in 1..200 do
ret.[2] <- x + y
let test3() =
let ret = Array.zeroCreate 100
let pool = {1..1000000}
for x in pool do
body ret x
// 00:00:03.7012302
When I disable JIT optimization using the technique described here, execution times of these functions are comparable.
Why x64 jitter fails in the particular example, I don't know. You can disassemble optimized jitted code to compare ASM instructions line by line. Maybe someone with good ASM knowledge can find out their differences.