Another microbenchmark: Why is this \"loop\" (compiled with ghc -O2 -fllvm
, 7.4.1, Linux 64bit 3.2 kernel, redirected to /dev/null
)
The standard Haskell way to hand giant bytestrings over to the operating system is to use a builder monoid.
import Data.ByteString.Lazy.Builder -- requires bytestring-0.10.x
import Data.ByteString.Lazy.Builder.ASCII -- omit for bytestring-0.10.2.x
import Data.Monoid
import System.IO
main = hPutBuilder stdout $ build [0..100000000::Int]
build = foldr add_line mempty
where add_line n b = intDec n <> charUtf8 '\n' <> b
which gives me:
$ time ./printbuilder >> /dev/null
real 0m7.032s
user 0m6.603s
sys 0m0.398s
in contrast to Haskell approach you used
$ time ./print >> /dev/null
real 1m0.143s
user 0m58.349s
sys 0m1.032s
That is, it's child's play to do nine times better than mapM_ print
, contra Daniel Fischer's suprising defeatism. Everything you need to know is here: http://hackage.haskell.org/packages/archive/bytestring/0.10.2.0/doc/html/Data-ByteString-Builder.html I won't compare it with your C since my results were much slower than Daniel's and n.m. so I figure something was going wrong.
Edit: Made the imports consistent with all versions of bytestring-0.10.x
It occurred to me the following might be clearer -- the Builder
equivalent of unlines . map show
:
main = hPutBuilder stdout $ unlines_ $ map intDec [0..100000000::Int]
where unlines_ = mconcat . map (<> charUtf8 '\n')
On my (rather slow and outdated) machine the results are:
$ time haskell-test > haskell-out.txt
real 1m57.497s
user 1m47.759s
sys 0m9.369s
$ time c-test > c-out.txt
real 7m28.792s
user 1m9.072s
sys 6m13.923s
$ diff haskell-out.txt c-out.txt
$
(I have fixed the list so that both C and Haskell start with 0).
Yes you read this right. Haskell is several times faster than C. Or rather, normally buffered Haskell is faster than C with write(2) non-buffered syscall.
(When measuring output to /dev/null instead of a real disk file, C is about 1.5 times faster, but who cares about /dev/null performance?)
Technical data: Intel E2140 CPU, 2 cores, 1.6 GHz, 1M cache, Gentoo Linux, gcc4.6.1, ghc7.6.1.
Okay, on my box the C code, compiled per gcc -O3
takes about 21.5 seconds to run, the original Haskell code about 56 seconds. So not a factor of 5, a bit above 2.5.
The first nontrivial difference is that
mapM_ print [1..100000000]
uses Integer
s, that's a bit slower because it involves a check upfront, and then works with boxed Int
s, while the Show
instance of Int
does the conversion work on unboxed Int#
s.
Adding a type signature, so that the Haskell code works on Int
s,
mapM_ print [1 :: Int .. 100000000]
brings the time down to 47 seconds, a bit above twice the time the C code takes.
Now, another big difference is that show
produces a linked list of Char
and doesn't just fill a contiguous buffer of bytes. That is slower too.
Then that linked list of Char
s is used to fill a byte buffer that then is written to the stdout
handle.
So, the Haskell code does more, and more complicated things than the C code, thus it's not surprising that it takes longer.
Admittedly, it would be desirable to have an easy way to output such things more directly (and hence faster). However, the proper way to handle it is to use a more suitable algorithm (that applies to C too). A simple change to
putStr . unlines $ map show [0 :: Int .. 100000000]
almost halves the time taken, and if one wants it really fast, one uses the faster ByteString
I/O and builds the output efficiently as exemplified in applicative's answer.