What is better, adjacency lists or adjacency matrix, for graph problems in C++? What are the advantages and disadvantages of each?
It depends on the problem.
Adjacency Matrix
Adjacency List
This answer is not just for C++ since everything mentioned is about the data structures themselves, regardless of language. And, my answer is assuming that you know the basic structure of adjacency lists and matrices.
If memory is your primary concern you can follow this formula for a simple graph that allows loops:
An adjacency matrix occupies n2/8 byte space (one bit per entry).
An adjacency list occupies 8e space, where e is the number of edges (32bit computer).
If we define the density of the graph as d = e/n2 (number of edges divided by the maximum number of edges), we can find the "breakpoint" where a list takes up more memory than a matrix:
8e > n2/8 when d > 1/64
So with these numbers (still 32-bit specific) the breakpoint lands at 1/64. If the density (e/n2) is bigger than 1/64, then a matrix is preferable if you want to save memory.
You can read about this at wikipedia (article on adjacency matrices) and a lot of other sites.
Side note: One can improve the space-efficiency of the adjacency matrix by using a hash table where the keys are pairs of vertices (undirected only).
Adjacency lists are a compact way of representing only existing edges. However, this comes at the cost of possibly slow lookup of specific edges. Since each list is as long as the degree of a vertex the worst case lookup time of checking for a specific edge can become O(n), if the list is unordered. However, looking up the neighbours of a vertex becomes trivial, and for a sparse or small graph the cost of iterating through the adjacency lists might be negligible.
Adjacency matrices on the other hand use more space in order to provide constant lookup time. Since every possible entry exists you can check for the existence of an edge in constant time using indexes. However, neighbour lookup takes O(n) since you need to check all possible neighbours. The obvious space drawback is that for sparse graphs a lot of padding is added. See the memory discussion above for more information on this.
If you're still unsure what to use: Most real-world problems produce sparse and/or large graphs, which are better suited for adjacency list representations. They might seem harder to implement but I assure you they aren't, and when you write a BFS or DFS and want to fetch all neighbours of a node they're just one line of code away. However, note that I'm not promoting adjacency lists in general.
To add to keyser5053's answer about memory usage.
For any directed graph, an adjacency matrix (at 1 bit per edge) consumes n^2 * (1)
bits of memory.
For a complete graph, an adjacency list (with 64 bit pointers) consumes n * (n * 64)
bits of memory, excluding list overhead.
For an incomplete graph, an adjacency list consumes 0
bits of memory, excluding list overhead.
For an adjacency list, you can use the follow formula to determine the maximum number of edges (e
) before an adjacency matrix is optimal for memory.
edges = n^2 / s
to determine the maximum number of edges, where s
is the pointer size of the platform.
If you're graph is dynamically updating, you can maintain this efficiency with an average edge count (per node) of n / s
.
Some examples with 64 bit pointers and dynamic graph (A dynamic graph updates the solution of a problem efficiently after changes, rather than recomputing it from scratch each time after a change has been made.)
For a directed graph, where n
is 300, the optimal number of edges per node using an adjacency list is:
= 300 / 64
= 4
If we plug this into keyser5053's formula, d = e / n^2
(where e
is the total edge count), we can see we are below the break point (1 / s
):
d = (4 * 300) / (300 * 300)
d < 1/64
aka 0.0133 < 0.0156
However, 64 bits for a pointer can be overkill. If you instead use 16bit integers as pointer offsets, we can fit up to 18 edges before breaking point.
= 300 / 16
= 18
d = ((18 * 300) / (300^2))
d < 1/16
aka 0.06 < 0.0625
Each of these examples ignore the overhead of the adjacency lists themselves (64*2
for a vector and 64 bit pointers).
Depending on the Adjacency Matrix implementation the 'n' of the graph should be known earlier for an efficient implementation. If the graph is too dynamic and requires expansion of the matrix every now and then that can also be counted as a downside?
It depends on what you're looking for.
With adjacency matrices you can answer fast to questions regarding if a specific edge between two vertices belongs to the graph, and you can also have quick insertions and deletions of edges. The downside is that you have to use excessive space, especially for graphs with many vertices, which is very inefficient especially if your graph is sparse.
On the other hand, with adjacency lists it is harder to check whether a given edge is in a graph, because you have to search through the appropriate list to find the edge, but they are more space efficient.
Generally though, adjacency lists are the right data structure for most applications of graphs.