GHC 7.10 generates slower code than older versions

前端 未结 1 481
有刺的猬
有刺的猬 2020-12-15 06:18

I realized that the latest version of GHC (7.10.3) produces significantly slower code than an older version. My current version as of now:

$ ghc --version
Th         


        
相关标签:
1条回答
  • 2020-12-15 06:56

    Here's a comparison of both profiles (diff Test-GHC-7-8-4.prof Test-GHC-7-10-3.prof)

    1c1                               
    <       Fri Mar 11 19:58 2016 Time and Allocation Profiling Report  (Final)
    ---                               
    >       Fri Mar 11 19:59 2016 Time and Allocation Profiling Report  (Final)
    5,6c5,6                               
    <       total time  =        2.40 secs   (2400 ticks @ 1000 us, 1 processor)
    <       total alloc = 256,066,744 bytes  (excludes profiling overheads)
    ---                               
    >       total time  =       10.89 secs   (10895 ticks @ 1000 us, 1 processor)
    >       total alloc = 15,713,590,808 bytes  (excludes profiling overheads)
    10,13c10,12                               
    < lenIterWhile.len Main     93.8   0.0                    
    < collatzMax       Main      2.2   93.7
    < collatzNext      Main      2.0    0.0
    < lenIterWhile     Main      1.5    6.2
    ---                                
    > collatzNext      Main     79.6   89.4
    > lenIterWhile.len Main     18.9    8.8
    > collatzMax       Main      0.8    1.5
    

    There's something very strange going on. While in GHC lenIterWhile.len was taking most of the time, collatzNext is now the culprit. Let's have a look at the dumped core:

    -- GHC 7.8.4
    Rec {
    Main.$wlen [Occ=LoopBreaker]
      :: GHC.Prim.Word# -> GHC.Prim.Int# -> GHC.Prim.Int#
    [GblId, Arity=2, Caf=NoCafRefs, Str=DmdType <S,1*U><L,U>]
    Main.$wlen =
      \ (ww_s4Mn :: GHC.Prim.Word#) (ww1_s4Mr :: GHC.Prim.Int#) ->
        case ww_s4Mn of wild_XQ {
          __DEFAULT ->
            case GHC.Prim.remWord# wild_XQ (__word 2) of _ [Occ=Dead] {
              __DEFAULT ->
                Main.$wlen
                  (GHC.Prim.quotWord#
                     (GHC.Prim.narrow32Word#
                        (GHC.Prim.plusWord#
                           (GHC.Prim.narrow32Word# (GHC.Prim.timesWord# (__word 3) wild_XQ))
                           (__word 1)))
                     (__word 2))
                  (GHC.Prim.+# ww1_s4Mr 1);
              __word 0 ->
                Main.$wlen
                  (GHC.Prim.quotWord# wild_XQ (__word 2)) (GHC.Prim.+# ww1_s4Mr 1)
            };
          __word 1 -> ww1_s4Mr
        }
    end Rec }
    

    Seems more or less reasonable. Now about GHC 7.10.3:

    Rec {
    $wlen_r6Sy :: GHC.Prim.Word# -> GHC.Prim.Int# -> GHC.Prim.Int#
    [GblId, Arity=2, Str=DmdType <S,U><L,U>]
    $wlen_r6Sy =
      \ (ww_s60s :: GHC.Prim.Word#) (ww1_s60w :: GHC.Prim.Int#) ->
        case ww_s60s of wild_X1Z {
          __DEFAULT ->
            case even
                   @ Word32 GHC.Word.$fIntegralWord32 (GHC.Word.W32# wild_X1Z)
            of _ [Occ=Dead] {
              False ->
                $wlen_r6Sy
                  (GHC.Prim.quotWord#
                     (GHC.Prim.narrow32Word#
                        (GHC.Prim.plusWord#
                           (GHC.Prim.narrow32Word# (GHC.Prim.timesWord# (__word 3) wild_X1Z))
                           (__word 1)))
                     (__word 2))
                  (GHC.Prim.+# ww1_s60w 1);
              True ->
                $wlen_r6Sy
                  (GHC.Prim.quotWord# wild_X1Z (__word 2)) (GHC.Prim.+# ww1_s60w 1)
            };
          __word 1 -> ww1_s60w
        }
    end Rec }

    Allright, seems like it's the same. Except for the call of even. Let's replace even with one of the inline variants of Integral, e.g. x rem 2 == 0:

    import Data.Word
    import Data.List
    import System.Environment
    
    collatzNext :: Word32 -> Word32
    collatzNext a = (if a `rem` 2 == 0 then a else 3*a+1) `div` 2
    
    -- rest of code the same
    

    Let's compile it again with profiling and check:

    $ stack --resolver=ghc-7.10 ghc -- Test.hs -O2 -fforce-recomp -prof -fprof-auto -auto-all
    $ ./Test +RTS -s -p -RTS 
    (329,837799)
         416,119,240 bytes allocated in the heap
              69,760 bytes copied during GC
              59,368 bytes maximum residency (2 sample(s))
              21,912 bytes maximum slop
                   1 MB total memory in use (0 MB lost due to fragmentation)
    
                                         Tot time (elapsed)  Avg pause  Max pause
      Gen  0       800 colls,     0 par    0.000s   0.002s     0.0000s    0.0001s
      Gen  1         2 colls,     0 par    0.000s   0.000s     0.0002s    0.0003s
    
      INIT    time    0.000s  (  0.019s elapsed)
      MUT     time    2.500s  (  2.546s elapsed)
      GC      time    0.000s  (  0.003s elapsed)
      RP      time    0.000s  (  0.000s elapsed)
      PROF    time    0.000s  (  0.000s elapsed)
      EXIT    time    0.000s  (  0.000s elapsed)
      Total   time    2.500s  (  2.567s elapsed)
    
      %GC     time       0.0%  (0.1% elapsed)
    
      Alloc rate    166,447,696 bytes per MUT second
    
      Productivity 100.0% of total user, 97.4% of total elapsed
    
    $ cat Test.prof
            Fri Mar 11 20:22 2016 Time and Allocation Profiling Report  (Final)
    
               Test.exe +RTS -s -p -RTS 1000000
    
            total time  =        2.54 secs   (2535 ticks @ 1000 us, 1 processor)
            total alloc = 256,066,984 bytes  (excludes profiling overheads)
    
    COST CENTRE      MODULE  %time %alloc
    
    lenIterWhile.len Main     94.4    0.0
    main             Main      1.9   93.7
    collatzNext      Main      1.8    0.0
    lenIterWhile     Main      1.3    6.2
    
                                                                       individual     inherited
    COST CENTRE           MODULE                     no.     entries  %time %alloc   %time %alloc
    
    MAIN                  MAIN                        44           0    0.0    0.0   100.0  100.0
     main                 Main                        89           0    1.9   93.7   100.0  100.0
      main.\              Main                        92     1000000    0.4    0.0    98.1    6.2
       collatzLen         Main                        93     1000000    0.2    0.0    97.8    6.2
        lenIterWhile      Main                        94     1000000    1.3    6.2    97.5    6.2
         lenIterWhile.len Main                        95    88826840   94.4    0.0    96.2    0.0
          collatzNext     Main                        96    87826840    1.8    0.0     1.8    0.0
      main.max_a0         Main                        90           1    0.0    0.0     0.0    0.0
     CAF                  GHC.IO.Encoding.CodePage    73           0    0.0    0.0     0.0    0.0
     CAF                  System.Environment          64           0    0.0    0.0     0.0    0.0
     CAF                  GHC.IO.Handle.Text          62           0    0.0    0.0     0.0    0.0
     CAF                  GHC.IO.Encoding             61           0    0.0    0.0     0.0    0.0
    

    Seems like that fixed it. So the problem is that GHC-7.8 inlines even, while GHC-7.10 doesn't. This happens due to added {-# SPECIALISE even :: x -> x -> Bool #-} rules for Int and Integer, which don't allow inlining.

    As issue's discussion documents making even and odd {-# INLINEABLE ... #-} would resolve this issue. Note that the specialisation itself was added for perfomance reasons.

    0 讨论(0)
提交回复
热议问题