No speedup with naive merge sort parallelization in Haskell

前端 未结 2 1891
深忆病人
深忆病人 2021-02-14 17:14

Note: This post was completely rewritten 2011-06-10; thanks to Peter for helping me out. Also, please don\'t be offended if I don\'t accept one answer, since this quest

相关标签:
2条回答
  • 2021-02-14 17:55

    The answer is pretty easy: Because you have at no point introduced parallelism. Eval is just a monad to order computations, you have to ask for things to be executed in parallel manually. What you probably want is:

    do xr <- rpar $ runEval $ mergeSort' x
       yr <- rseq $ runEval $ mergeSort' y
       rseq (merge xr yr)
    

    This will make Haskell actually create a spark for the first computation, instead of trying to evaluate it on the spot.

    Standard tips also kind-of apply:

    1. The result should be evaluated deeply (e.g. using evalTraversable rseq). Otherwise you will only force the head of the tree, and the bulk of the data will just be returned unevaluated.
    2. Just sparking everything will most likely eat up any gains. It would be a good idea to introduce a parameter that stops sparking at lower recursion levels.

    Edit: The following actually doesn't apply anymore after the question edit

    But the worst part last: Your algorithm as you state it is very flawed. Your top-level seq only forces the first cons-cell of the list, which allows GHC to use lazyness to great effect. It will never actually construct the result list, just plow through all of them in a search for the minimum element (that's not even strictly needed, but GHC only produces the cell after the minimum is known).

    So don't be surprised when performance actually drops sharply when you start introducing parallelism under the assumptions that you need the whole list at some point in the program...

    Edit 2: Some more answers to the edits

    The biggest problem with your program is probably that it is using lists. If you want to make more than a toy example, consider at least using (unpacked) Arrays. If you want to go into serious number-crunching, maybe consider a specialised library like repa.

    On "Further Discussion":

    • The colors stand for different GC states, I can't remember which. Try to look at the event log for the associated event.

    • The way to "sidestep" garbage collection is to not produce so much garbage in the first place, e.g. by using better data structures.

    • Well, if you are looking for an inspiration on robust parallelization it might be worthwhile to have a look at monad-par, which is relatively new but (I feel) less "surprising" in its parallel behaviour.

    With monad-par, your example might become something like:

      do xr <- spawn $ mergeSort' x
         yr <- spawn $ mergeSort' y
         merge <$> get xr <*> get yr
    

    So here the get actually forces you to specify the join points - and the library does the required deepseq automatically behind the scenes.

    0 讨论(0)
  • 2021-02-14 18:02

    I had similar luck to what you report in EDIT 3 on a dual core system with these variants. I used a smaller list length because I'm on a smaller computer, compiled with ghc -O2 -rtsopts -threaded MergePar.hs, and ran with ./MergePar +RTS -H256M -N. This might offer a more structured way to compare performance. Note that the RTS option -qa sometimes helps the simple par variants.

    import Control.Applicative
    import Control.Parallel
    import Control.Parallel.Strategies
    import Criterion.Main
    import GHC.Conc (numCapabilities)
    
    data Tree a = Leaf a | Node (Tree a) (Tree a) deriving Show
    
    listToTree [] = error "listToTree -- empty list"
    listToTree [x] = Leaf x
    listToTree xs = Node (listToTree (take half xs)) (listToTree (drop half xs))
      where half = length xs `div` 2
    
    -- Merge two ordered lists
    merge :: Ord a => [a] -> [a] -> [a]
    merge [] y = y
    merge x [] = x
    merge (x:xs) (y:ys) | x < y = x : merge xs (y:ys)
                        | otherwise = y : merge (x:xs) ys
    
    -- Simple merge sort
    mergeSort' :: Ord a => Tree a -> [a]
    mergeSort' (Leaf v) = [v]
    mergeSort' (Node x y) = merge (mergeSort' x) (mergeSort' y)
    
    mergeSort :: Ord a => [a] -> [a]
    mergeSort = mergeSort' . listToTree
    
    -- Merge sort with 'par' annotations on every recursive call
    mergeSortP' :: Ord a => Tree a -> [a]
    mergeSortP' (Leaf v) = [v]
    mergeSortP' (Node x y) = let xr = mergeSortP' x
                                 yr = mergeSortP' y
                             in xr `par` yr `pseq` merge xr yr
    
    mergeSortP :: Ord a => [a] -> [a]
    mergeSortP = mergeSortP' . listToTree
    
    -- Merge sort with 'rpar' annotations on every recursive call
    mergeSortR' :: Ord a => Tree a -> [a]
    mergeSortR' (Leaf v) = [v]
    mergeSortR' (Node x y) = 
      runEval $ merge <$> rpar (mergeSortR' x) <*> rpar (mergeSortR' y)
    
    mergeSortR :: Ord a => [a] -> [a]
    mergeSortR = mergeSortR' . listToTree
    
    -- Parallel merge sort that stops looking for parallelism at a certain
    -- depth
    smartMerge' :: Ord a => Int -> Tree a -> [a]
    smartMerge' _ (Leaf v) = [v]
    smartMerge' n t@(Node x y)
      | n <= 1 = mergeSort' t
      | otherwise = let xr = smartMerge' (n-1) x
                        yr = smartMerge' (n-2) y
                    in xr `par` yr `pseq` merge xr yr
    
    smartMerge :: Ord a => [a] -> [a]
    smartMerge = smartMerge' numCapabilities . listToTree
    
    main = defaultMain $ [ bench "original" $ nf mergeSort lst
                         , bench "par" $ nf mergeSortP lst
                         , bench "rpar" $ nf mergeSortR lst
                         , bench "smart" $ nf smartMerge lst ]
      where lst = [100000,99999..0] :: [Int]
    
    0 讨论(0)
提交回复
热议问题