Basic I/O performance in Haskell

后端未结

关注

 3  724

Another microbenchmark: Why is this \"loop\" (compiled with ghc -O2 -fllvm, 7.4.1, Linux 64bit 3.2 kernel, redirected to /dev/null)


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  天命终不由人        
                
              
                            
                2021-01-03 08:41
              
            
            
                                                                       
The standard Haskell way to hand giant bytestrings over to the operating system is to use a builder monoid. 

import Data.ByteString.Lazy.Builder  -- requires bytestring-0.10.x
import Data.ByteString.Lazy.Builder.ASCII -- omit for bytestring-0.10.2.x
import Data.Monoid
import System.IO

main = hPutBuilder stdout $ build  [0..100000000::Int]

build = foldr add_line mempty
   where add_line n b = intDec n <> charUtf8 '\n' <> b


which gives me:

 $ time ./printbuilder >> /dev/null
 real   0m7.032s
 user   0m6.603s
 sys    0m0.398s


in contrast to Haskell approach you used

$ time ./print >> /dev/null
real    1m0.143s
user    0m58.349s
sys 0m1.032s


That is, it's child's play to do nine times better than  mapM_ print, contra Daniel Fischer's suprising defeatism. Everything you need to know is here: http://hackage.haskell.org/packages/archive/bytestring/0.10.2.0/doc/html/Data-ByteString-Builder.html I won't compare it with your C since my results were much slower than Daniel's and n.m. so I figure something was going wrong.

Edit: Made the imports consistent with all versions of bytestring-0.10.x It occurred to me the following might be clearer -- the Builder equivalent of unlines . map show:

main = hPutBuilder stdout $ unlines_ $ map intDec [0..100000000::Int]
 where unlines_ = mconcat . map (<> charUtf8 '\n')

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  死守一世寂寞        
                
              
                            
                2021-01-03 08:44
              
            
            
                                                                       
On my (rather slow and outdated) machine the results are:

$ time haskell-test > haskell-out.txt
real    1m57.497s
user    1m47.759s
sys     0m9.369s
$ time c-test > c-out.txt
real    7m28.792s
user    1m9.072s
sys     6m13.923s
$ diff haskell-out.txt c-out.txt
$


(I have fixed the list so that both C and Haskell start with 0).

Yes you read this right. Haskell is several times faster than C. Or rather, normally buffered Haskell is faster than C with write(2) non-buffered syscall.

(When measuring output to /dev/null instead of a real disk file, C is about 1.5 times faster, but who cares about /dev/null performance?)

Technical data: Intel E2140 CPU, 2 cores, 1.6 GHz, 1M cache, Gentoo Linux, gcc4.6.1, ghc7.6.1.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  死守一世寂寞        
                
              
                            
                2021-01-03 08:50
              
            
            
                                                                       
Okay, on my box the C code, compiled per gcc -O3 takes about 21.5 seconds to run, the original Haskell code about 56 seconds. So not a factor of 5, a bit above 2.5.

The first nontrivial difference is that

mapM_ print [1..100000000]


uses Integers, that's a bit slower because it involves a check upfront, and then works with boxed Ints, while the Show instance of Int does the conversion work on unboxed Int#s.

Adding a type signature, so that the Haskell code works on Ints,

mapM_ print [1 :: Int .. 100000000]


brings the time down to 47 seconds, a bit above twice the time the C code takes.

Now, another big difference is that show produces a linked list of Char and doesn't just fill a contiguous buffer of bytes. That is slower too.

Then that linked list of Chars is used to fill a byte buffer that then is written to the stdout handle.

So, the Haskell code does more, and more complicated things than the C code, thus it's not surprising that it takes longer.

Admittedly, it would be desirable to have an easy way to output such things more directly (and hence faster). However, the proper way to handle it is to use a more suitable algorithm (that applies to C too). A simple change to

putStr . unlines $ map show [0 :: Int .. 100000000]


almost halves the time taken, and if one wants it really fast, one uses the faster ByteString I/O and builds the output efficiently as exemplified in applicative's answer.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复