I am trying to implement Kosaraju\'s graph algorithm, on a 3.5m line file where each row is two (space separated) Ints representing a graph edge. To start I need to create a su
This is not really an answer, I would rather comment András Kovács post, if I add those 50 points...
I have implemented the loading of the graph in both IntMap and MVector, in a attempt to benchmark mutability vs. immutability.
Both program use Attoparsec for the parsing. There is surely more economic way to do it, but Attoparsec is relatively fast compared to its high abstraction level (the parser can stand in one line). The guideline is to avoid String
and read
. read
is partial and slow, [Char]
is slow and not memory efficient, unless properly fused.
As András Kovács noted, IntMap is better than Map for Int keys. My code provides another example of alter
usage. If the node identifier mapping is dense, you may also want to use Vector and Array. They allow O(1) indexing by the identifier.
The mutable version handle on demand the exponential growth of the MVector. This avoid to precise an upper bound on node identifiers, but introduce more complexity (the reference on the vector may change).
I benchmarked with a file of 5M edges with identifiers in the range [0..2^16]. The MVector version is ~2x faster than the IntMap code (12s vs 25s on my computer).
The code is here [Gist].
I will edit when more profiling is done on my side.