Consider a directed graph which is traversed from first node 1
to some final nodes (which have no more outgoing edges). Each edge in the graph has a probability ass
Problem Clarification
The input data is a set of m rows of n columns of probabilities, essentially an m by n matrix, where m = n = number of vertices on a directed graph. Rows are edge origins and columns are edge destinations. We will, on the bases of the mention of cycles in the question, that the graph is cyclic, that at least one cycle exists in the graph.
Let's define the starting vertex as s. Let's also define a terminal vertex as a vertex for which there are no exiting edges and the set of them as set T with size z. Therefore we have z sets of routes from s to a vertex in T, and the set sizes may be infinite due to cycles 1. In such a scenario, one cannot conclude that a terminal vertex will be reached in an arbitrarily large number of steps.
In the input data, probabilities for rows that correspond with vertices not in T are normalized to total to 1.0. We shall assume the Markov property, that the probabilities at each vertex do not vary with time. This precludes the use of probability to prioritize routes in a graph search 2.
Finite math texts sometimes name example problems similar to this question as Drunken Random Walks to underscore the fact that the walker forgets the past, referring to the memory-free nature of Markovian chains.
Applying Probability to Routes
The probability of arriving at a terminal vertex can be expressed as an infinite series sum of products.
Pt = lim s -> ∞ Σ ∏ Pi, j,
where s is the step index, t is a terminal vertex index, i ∈ [1 .. m] and j ∈ [1 .. n]
Reduction
When two or more cycles intersect (sharing one or more vertices), analysis is complicated by an infinite set of patterns involving them. It appears, after some analysis and review of relevant academic work, that arriving at an accurate set of terminal vertex arrival probabilities with today's mathematical tools may best be accomplished with a converging algorithm.
A few initial reductions are possible.
The first consideration is to enumerate the destination vertex, which is easy since the corresponding rows have probabilities of zero.
The next consideration is to differentiate any further reductions from what the academic literature calls irreducible sub-graphs. The below depth first algorithm remembers which vertices have already been visited while constructing a potential route, so it can be easily retrofitted to identify which vertices are involved in cycles. However it is recommended to use existing well tested, peer reviewed graph libraries to identify and characterize sub-graphs as irreducible.
Mathematical reduction of irreducible portions of the graph may or may not be plausible. Consider starting vertex A and sole terminating vertex B in the graph represented as {A->C, C->A, A->D, D->A, C->D, D->C, C->B, D->B}.
Although one can reduce the graph to probability relations absent of cycles through vertex A, the vertex A cannot be removed for further reduction without either modifying probabilities of vertices exiting C and D or allowing both totals of probabilities of edges exiting C and D to be less than 1.0.
Convergent Breadth First Traversal
A breadth first traversal that ignores revisiting and allows cycles can iterate step index s, not to some fixed smax but to some sufficiently stable and accurate point in a convergent trend. This approach is especially called for if cycles overlap creating bifurcations in the simpler periodicity caused by a single cycle.
Σ PsΔ s.
For the establishment of a reasonable convergence as s increases, one must determine the desired accuracy as a criteria for completing convergence algorithm and a metric for measuring accuracy by looking at longer term trends in results at all terminal vertices. It may be important to provide a criteria where the sum of terminal vertex probabilities is close to unity in conjunction with the trend convergence metric, as both a sanity check and an accuracy criteria. Practically, four convergence criteria may be necessary 3.
Even beyond these four, the program may need to contain a trap for an interrupt that permits the writing and subsequent examination of output after a long wait without the satisfying of all four above criteria.
An Example Cycle Resistant Depth First Algorithm
There are more efficient algorithms than the following one, but it is fairly comprehensible, it compiles without warning with C++ -Wall, and it produces the desired output for all finite and legitimate directed graphs and start and destination vertices possible 4. It is easy to load a matrix in the form given in the question using the addEdge method 5.
#include
#include
class DirectedGraph {
private:
int miNodes;
std::list * mnpEdges;
bool * mpVisitedFlags;
private:
void initAlreadyVisited() {
for (int i = 0; i < miNodes; ++ i)
mpVisitedFlags[i] = false;
}
void recurse(int iCurrent, int iDestination,
int route[], int index,
std::list *> * pnai) {
mpVisitedFlags[iCurrent] = true;
route[index ++] = iCurrent;
if (iCurrent == iDestination) {
auto pni = new std::list;
for (int i = 0; i < index; ++ i)
pni->push_back(route[i]);
pnai->push_back(pni);
} else {
auto it = mnpEdges[iCurrent].begin();
auto itBeyond = mnpEdges[iCurrent].end();
while (it != itBeyond) {
if (! mpVisitedFlags[* it])
recurse(* it, iDestination,
route, index, pnai);
++ it;
}
}
-- index;
mpVisitedFlags[iCurrent] = false;
}
public:
DirectedGraph(int iNodes) {
miNodes = iNodes;
mnpEdges = new std::list[iNodes];
mpVisitedFlags = new bool[iNodes];
}
~DirectedGraph() {
delete mpVisitedFlags;
}
void addEdge(int u, int v) {
mnpEdges[u].push_back(v);
}
std::list *> * findRoutes(int iStart,
int iDestination) {
initAlreadyVisited();
auto route = new int[miNodes];
auto pnpi = new std::list *>();
recurse(iStart, iDestination, route, 0, pnpi);
delete route;
return pnpi;
}
};
int main() {
DirectedGraph dg(5);
dg.addEdge(0, 1);
dg.addEdge(0, 2);
dg.addEdge(0, 3);
dg.addEdge(1, 3);
dg.addEdge(1, 4);
dg.addEdge(2, 0);
dg.addEdge(2, 1);
dg.addEdge(4, 1);
dg.addEdge(4, 3);
int startingNode = 2;
int destinationNode = 3;
auto pnai = dg.findRoutes(startingNode, destinationNode);
std::cout
<< "Unique routes from "
<< startingNode
<< " to "
<< destinationNode
<< std::endl
<< std::endl;
bool bFirst;
std::list * pi;
auto it = pnai->begin();
auto itBeyond = pnai->end();
std::list::iterator itInner;
std::list::iterator itInnerBeyond;
while (it != itBeyond) {
bFirst = true;
pi = * it ++;
itInner = pi->begin();
itInnerBeyond = pi->end();
while (itInner != itInnerBeyond) {
if (bFirst)
bFirst = false;
else
std::cout << ' ';
std::cout << (* itInner ++);
}
std::cout << std::endl;
delete pi;
}
delete pnai;
return 0;
}
Notes
[1] Improperly handled cycles in a directed graph algorithm will hang in an infinite loop. (Note the trivial case where the number of routes from A to B for the directed graph represented as {A->B, B->A} is infinity.)
[2] Probabilities are sometimes used to reduce the CPU cycle cost of a search. Probabilities, in that strategy, are input values for meta rules in a priority queue to reduce the computational challenge very tedious searches (even for a computer). The early literature in production systems termed the exponential character of unguided large searches Combinatory Explosions.
[3] It may be practically necessary to detect breadth first probability trend at each vertex and specify satisfactory convergence in terms of four criteria
[4] Provided there are enough computing resources available to support the data structures and ample time to arrive at an answer for the given computing system speed.
[5] You can load DirectedGraph dg(7) with the input data using two loops nested to iterate through the rows and columns enumerated in the question. The body of the inner loop would simply be a conditional edge addition.
if (prob != 0) dg.addEdge(i, j);
Variable prob is P m,n. Route existence is only concerned with zero/nonzero status.