I am trying to implement Kosaraju\'s graph algorithm, on a 3.5m line file where each row is two (space separated) Ints representing a graph edge. To start I need to create a su
IntMap
or HashMap
when possible. Both are significantly faster for Int
keys than Map
. HashMap
is usually faster than IntMap
but uses more RAM and has a less rich library.containers
package has a large number of specialized functions. With alter the number of lookups can be halved compared to the createGraph
implementation in the question. Example for createGraph
:
import Data.List (foldl')
import qualified Data.IntMap.Strict as IM
type NodeName = Int
type Edges = [NodeName]
type Explored = Bool
data Node = Node Explored Edges Edges deriving (Eq, Show)
type Graph1 = IM.IntMap Node
createGraph :: [(Int, Int)] -> Graph1
createGraph xs = foldl' build IM.empty xs
where
addFwd y (Just (Node _ f b)) = Just (Node False (y:f) b)
addFwd y _ = Just (Node False [y] [])
addBwd x (Just (Node _ f b)) = Just (Node False f (x:b))
addBwd x _ = Just (Node False [] [x])
build :: Graph1 -> (Int, Int) -> Graph1
build acc (x, y) = IM.alter (addBwd x) y $ IM.alter (addFwd y) x acc
generate
, iterate
, constructN
, etc.). These may use mutation behind the scenes but are considerably more convenient to use than actual mutable vectors.Example for createGraph
:
import qualified Data.Vector as V
type NodeName = Int
type Edges = [NodeName]
type Explored = Bool
data Node = Node Explored Edges Edges deriving (Eq, Show)
type Graph1 = V.Vector Node
createGraph :: Int -> [(Int, Int)] -> Graph1
createGraph maxIndex edges = graph'' where
graph = V.replicate maxIndex (Node False [] [])
graph' = V.accum (\(Node e f b) x -> Node e (x:f) b) graph edges
graph'' = V.accum (\(Node e f b) x -> Node e f (x:b)) graph' (map (\(a, b) -> (b, a)) edges)
Note that if there are gaps in the range of the node indices, then it'd be wise to either
Node
to signify a missing index. Data.Text
or Data.ByteString
. In both cases there are also efficient functions for breaking input into lines or words. Example:
import qualified Data.ByteString.Char8 as BS
import System.IO
getLines :: FilePath -> IO [(Int, Int)]
getLines path = do
lines <- (map BS.words . BS.lines) `fmap` BS.readFile path
let pairs = (map . map) (maybe (error "can't read Int") fst . BS.readInt) lines
return [(a, b) | [a, b] <- pairs]
Always do it, unlike me in this answer. Use criterion.