What is better, adjacency lists or adjacency matrix, for graph problems in C++? What are the advantages and disadvantages of each?
If you are looking at graph analysis in C++ probably the first place to start would be the boost graph library, which implements a number of algorithms including BFS.
EDIT
This previous question on SO will probably help:
how-to-create-a-c-boost-undirected-graph-and-traverse-it-in-depth-first-search
I am just going to touch on overcoming the trade-off of regular adjacency list representation, since other answers have covered other aspects.
It is possible to represent a graph in adjacency list with EdgeExists query in amortized constant time, by taking advantage of Dictionary and HashSet data structures. The idea is to keep vertices in a dictionary, and for each vertex, we keep a hash set referencing to other vertices it has edges with.
One minor trade-off in this implementation is that it will have space complexity O(V + 2E) instead of O(V + E) as in regular adjacency list, since edges are represented twice here (because each vertex have its own hash set of edges). But operations such as AddVertex, AddEdge, RemoveEdge can be done in amortized time O(1) with this implementation, except for RemoveVertex which takes O(V) like adjacency matrix. This would mean that other than implementation simplicity, adjacency matrix don't have any specific advantage. We can save space on sparse graph with almost the same performance in this adjacency list implementation.
Take a look at implementations below in Github C# repository for details. Note that for weighted graph it uses a nested dictionary instead of dictionary-hash set combination so as to accommodate weight value. Similarly for directed graph there is separate hash sets for in & out edges.
Advanced-Algorithms
Note: I believe using lazy deletion we can further optimize RemoveVertex operation to O(1) amortized, even though I haven't tested that idea. For example, upon deletion just mark the vertex as deleted in dictionary, and then lazily clear orphaned edges during other operations.
Lets assume we have a graph which has n number of nodes and m number of edges,
Example graph
Adjacency Matrix: We are creating a matrix that has n number of rows and columns so in memory it will take space that is proportional to n2. Checking if two nodes named as u and v has an edge between them will take Θ(1) time. For example checking for (1, 2) is an edge will look like as follows in the code:
if(matrix[1][2] == 1)
If you want to identify all edges, you have to iterate over matrix at this will require two nested loops and it will take Θ(n2). (You may just use the upper triangular part of the matrix to determine all edges but it will be again Θ(n2))
Adjacency List: We are creating a list that each node also points to another list. Your list will have n elements and each element will point to a list that has number of items that is equal to number of neighbors of this node (look image for better visualization). So it will take space in memory that is proportional to n+m. Checking if (u, v) is an edge will take O(deg(u)) time in which deg(u) equals number of neighbors of u. Because at most, you have to iterate over the list that is pointed by the u. Identifying all edges will take Θ(n+m).
Adjacency list of example graph
You should make your choice according to your needs.
Because of my reputation I couldn't put image of matrix, sorry for that
If you use a hash table instead of either adjacency matrix or list, you'll get better or same big-O run-time and space for all operations (checking for an edge is O(1)
, getting all adjacent edges is O(degree)
, etc.).
There's some constant factor overhead though for both run-time and space (hash table isn't as fast as linked list or array lookup, and takes a decent amount extra space to reduce collisions).
Okay, I've compiled the Time and Space complexities of basic operations on graphs.
The image below should be self-explanatory.
Notice how Adjacency Matrix is preferable when we expect the graph to be dense, and how Adjacency List is preferable when we expect the graph to be sparse.
I've made some assumptions. Ask me if a complexity (Time or Space) needs clarification. (For example, For a sparse graph, I've taken En to be a small constant, as I've assumed that addition of a new vertex will add only a few edges, because we expect the graph to remain sparse even after adding that vertex.)
Please tell me if there are any mistakes.
This is best answered with examples.
Think of Floyd-Warshall for example. We have to use an adjacency matrix, or the algorithm will be asymptotically slower.
Or what if it's a dense graph on 30,000 vertices? Then an adjacency matrix might make sense, as you'll be storing 1 bit per pair of vertices, rather than the 16 bits per edge (the minimum that you would need for an adjacency list): that's 107 MB, rather than 1.7 GB.
But for algorithms like DFS, BFS (and those that use it, like Edmonds-Karp), Priority-first search (Dijkstra, Prim, A*), etc., an adjacency list is as good as a matrix. Well, a matrix might have a slight edge when the graph is dense, but only by an unremarkable constant factor. (How much? It's a matter of experimenting.)