F# vs OCaml: Stack overflow

后端 未结 2 1545
情深已故
情深已故 2021-01-30 06:31

I recently found a presentation about F# for Python programmers, and after watching it, I decided to implement a solution to the \"ant puzzle\" on my own.

There is a

2条回答
  •  鱼传尺愫
    2021-01-30 06:49

    Executive summary:

    • I wrote a simple implementation of an algorithm... that wasn't tail-recursive.
    • I compiled it with OCaml under Linux.
    • It worked fine, and finished in 0.14 seconds.

    It was then time to port to F#.

    • I translated the code (direct translation) to F#.
    • I compiled under Windows, and run it - I got a stack overflow.
    • I took the binary under Linux, and run it under Mono.
    • It worked, but run very slowly (84 seconds).

    I then posted to Stack Overflow - but some people decided to close the question (sigh).

    • I tried compiling with --optimize+ --checked-
    • The binary still stack overflowed under Windows...
    • ...but run fine (and finished in 0.5 seconds) under Linux/Mono.

    It was time to check the stack size: Under Windows, another SO post pointed out that it is set by default to 1MB. Under Linux, "uname -s" and a compilation of a test program clearly showed that it is 8MB.

    This explained why the program worked under Linux and not under Windows (the program used more than 1MB of stack). It didn't explain why the optimized version run so much better under Mono than the non-optimized one: 0.5 seconds vs 84 seconds (even though the --optimize+ appears to be set by default, see comment by Keith with "Expert F#" extract). Probably has to do with the garbage collector of Mono, which was somehow driven to extremes by the 1st version.

    The difference between Linux/OCaml and Linux/Mono/F# execution times (0.14 vs 0.5) is because of the simple way I measured it: "time ./binary ..." measures the startup time as well, which is significant for Mono/.NET (well, significant for this simple little problem).

    Anyway, to solve this once and for all, I wrote a tail-recursive version - where the recursive call at the end of the function is transformed into a loop (and hence, no stack usage is necessary - at least in theory).

    The new version run fine under Windows as well, and finished in 0.5 seconds.

    So, moral of the story:

    • Beware of your stack usage, especially if you use lots of it and run under Windows. Use EDITBIN with the /STACK option to set your binaries to larger stack sizes, or better yet, write your code in a manner that doesn't depend on using too much stack.
    • OCaml may be better at tail-recursion elimination than F# - or it's garbage collector is doing a better job at this particular problem.
    • Don't despair about ...rude people closing your Stack Overflow questions, good people will counteract them in the end - if the questions are really good :-)

    P.S. Some additional input from Dr. Jon Harrop:

    ...you were just lucky that OCaml didn't overflow as well. You already identified that actual stack sizes vary between platforms. Another facet of the same issue is that different language implementations eat stack space at different rates and have different performance characteristics in the presence of deep stacks. OCaml, Mono and .NET all use different data representations and GC algorithms that impact these results... (a) OCaml uses tagged integers to distinguish pointers, giving compact stack frames, and will traverse everything on the stack looking for pointers. The tagging essentially conveys just enough information for the OCaml run time to be able to traverse the heap (b) Mono treats words on the stack conservatively as pointers: if, as a pointer, a word would point into a heap-allocated block then that block is considered to be reachable. (c) I do not know .NET's algorithm but I wouldn't be surprised if it ate stack space faster and still traversed every word on the stack (it certainly suffers pathological performance from the GC if an unrelated thread has a deep stack!)... Moreover, your use of heap-allocated tuples means you'll be filling the nursery generation (e.g. gen0) quickly and, therefore, causing the GC to traverse those deep stacks often...

提交回复
热议问题