Optimising Haskell data reading from file

前端 未结 3 862
忘掉有多难
忘掉有多难 2021-02-07 18:16

I am trying to implement Kosaraju\'s graph algorithm, on a 3.5m line file where each row is two (space separated) Ints representing a graph edge. To start I need to create a su

3条回答
  •  逝去的感伤
    2021-02-07 18:59

    Based pretty much on András' suggestions, I've reduced a 113 second task down to 24 (measured by stopwatch as I can't quite get Criterion to do anything yet) (and then down to 10 by compiling -O2)!!! I've attended some courses this last year that talked about the challenge of optimising for large datasets but this was the first time I faced a question that actually involved one, and it was as non-trivial as my instructors' suggested. This is what I have now:

    import System.IO
    import Control.Monad
    import Data.List (foldl')
    import qualified Data.IntMap.Strict as IM
    import qualified Data.ByteString.Char8 as BS
    
    type NodeName = Int
    type Edges = [NodeName]
    type Explored = Bool
    
    data Node = Node Explored Edges Edges deriving (Eq, Show)
    type Graph1 = IM.IntMap Node
    
    -- DFS uses a stack to store next points to explore, a list can do this
    type Stack = [(NodeName, NodeName)]
    
    getBytes :: FilePath -> IO [(Int, Int)]
    getBytes path = do
        lines <- (map BS.words . BS.lines) `fmap` BS.readFile path
        let
            pairs = (map . map) (maybe (error "Can't read integers") fst . BS.readInt) lines
        return [(a,b) | [a,b] <- pairs]
    
    main = do
        --list <- getLines' "testdata.txt"  -- [String]
        list <- getBytes "SCC.txt"  -- [String] 
        let list' = createGraph' list
        putStrLn $ show $ list' IM.! 66
        -- return list'
    
    
    bmark = defaultMain [
        bgroup "1" [
            bench "Sim test" $ whnf bmark' "SCC.txt"
            ]
        ]
    
    bmark' :: FilePath -> IO ()
    bmark' path = do
        list <- getLines path
        let
            list' = createGraph list
        putStrLn $ show $ list' IM.! 2
    
    
    createGraph' :: [(Int, Int)] -> Graph1
    createGraph' xs = foldl' build IM.empty xs
        where
            addFwd y (Just (Node _ f b)) = Just (Node False (y:f) b)
            addFwd y _                   = Just (Node False [y] [])
            addBwd x (Just (Node _ f b)) = Just (Node False f (x:b))
            addBwd x _                   = Just (Node False [] [x])
    
            build :: Graph1 -> (Int, Int) -> Graph1
            build acc (x, y) = IM.alter (addBwd x) y $ IM.alter (addFwd y) x acc 
    

    And now on with the rest of the exercise....

提交回复
热议问题