Find the shortest path from source to destination in a directed graph with positive and negative edges, such that at no point in the path the sum of edges coming
UPDATE: The OP now has had several rounds of clarifications, and it is a different problem now. I'll leave this here for documenting my ideas for the first version of the problem (or rather my understanding of it). I'll try a new answer for the current version of the problem. End of UPDATE
It's a pity that the OP hasn't clarified some of the open questions. I'll assume the following:
The first assumption is no loss of generality, obviously, but it has great impact on the value of n (via the second assumption). Without the first assumption, even a tiny (fixed) graph can have arbitrary long solutions by varying the weights without limits.
The algorithm I propose is quite simple, and similar to well-known graph algorithms. I'm no graph expert though, so I may use the wrong words in some places. Feel free to correct me.
It's clear that each "step" that's not an immediate dead end creates a new (vertex, cost) combination. There will be stored at most n * n ^2 = n ^ 3 of these combinations, and thus, in a certain sense, this algorithm is O(n^3).
Now, why does this find the optimal path? I don't have a real proof, but I think it the following ideas justify why I think this suffices, and it may be possible that they can be turned into a real proof.
I think it is clear that the only thing we have to show is that the condition c <= n ^ 2 is sufficient.
First, let's note that any (reachable) vertex can be reached with cost less than n.
Let (v, c) be part of an optimal path and c > n ^ 2. As c > n, there must be some cycle on the path before reaching (v, c), where the cost of the cycle is 0 < m1 < n, and there must be some cycle on the path after reaching (v, c), where the cost of the cycle is 0 > m2 > -n.
Furthermore, let v be reachable from the source with cost 0 <= c1 < n, by a path that touches the first cycle mentioned above, and let the destination be reachable from v with cost 0 <= c2 < n, by a path that touches the other cycle mentioned above.
Then we can construct paths from source to v with costs c1, c1 + m1, c1 + 2 * m1, ..., and paths from v to destination with costs c2, c2 + m2, c2 + 2 * m2, ... . Choose 0 <= a <= m2 and 0 <= b <= m1 such that c1 + c2 + a * m1 + b * m2 is minimal and thus the cost of an optimal path. On this optimal path, v would have the cost c1 + a * m1 < n ^ 2.
If the gcd of m1 and m2 is 1, then the cost will be 0. If the gcd is > 1, then it might be possible to choose other cycles such that the gcd becomes 1. If that is not possible, it's also not possible for the optimal solution, and there will be a positive cost for the optimal solution.
(Yes, I can see several problems with this attempt of a proof. It might be necessary to take the gcd of several positive or negative cycle costs etc. I would be very interested in a counterexample, though.)
Here's some (Python) code:
def f(vertices, edges, source, dest):
# vertices: unique hashable objects
# edges: mapping (u, v) -> cost; u, v in vertices, cost in {-1, 1}
#vertex_costs stores the possible costs for each vertex
vertex_costs = dict((v, set()) for v in vertices)
vertex_costs[source].add(0) # source can be reached with cost 0
#vertex_costs_from stores for each the possible costs for each vertex the previous vertex
vertex_costs_from = dict()
# vertex_gotos is a convenience structure mapping a vertex to all ends of outgoing edges and their cost
vertex_gotos = dict((v, []) for v in vertices)
for (u, v), c in edges.items():
vertex_gotos[u].append((v, c))
max_c = len(vertices) ** 2 # the crucial number: maximal cost that's possible for an optimal path
todo = [(source, 0)] # which vertices to look at
while todo:
u, c0 = todo.pop(0)
for v, c1 in vertex_gotos[u]:
c = c0 + c1
if 0 <= c <= max_c and c not in vertex_costs[v]:
vertex_costs[v].add(c)
vertex_costs_from[v, c] = u
todo.append((v, c))
if not vertex_costs[dest]: # destination not reachable
return None # or raise some Exception
cost = min(vertex_costs[dest])
path = [(dest, cost)] # build in reverse order
v, c = dest, cost
while (v, c) != (source, 0):
u = vertex_costs_from[v, c]
c -= edges[u, v]
v = u
path.append((v, c))
return path[::-1] # return the reversed path
And the output for some graphs (edges and their weight / path / cost at each point of the path; sorry, no nice images):
AB+ BC+ CD+ DA+ AX+ XY+ YH+ HI- IJ- JK- KL- LM- MH-
A B C D A X Y H I J K L M H
0 1 2 3 4 5 6 7 6 5 4 3 2 1
AB+ BC+ CD+ DE+ EF+ FG+ GA+ AX+ XY+ YH+ HI- IJ- JK- KL- LM- MH-
A B C D E F G A B C D E F G A B C D E F G A X Y H I J K L M H I J K L M H I J K L M H I J K L M H
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
AB+ BC+ CD+ DE+ EF+ FG+ GA+ AX+ XY+ YH+ HI- IJ- JK- KL- LM- MN- NH-
A X Y H
0 1 2 3
AB+ BC+ CD+ DE+ EF+ FG+ GA+ AX+ XY+ YH+ HI- IJ- JK- KL- LM- MN- NO- OP- PH-
A B C D E F G A B C D E F G A B C D E F G A B C D E F G A B C D E F G A B C D E F G A X Y H I J K L M N O P H I J K L M N O P H I J K L M N O P H I J K L M N O P H I J K L M N O P H
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Here's the code to produce that output:
def find_path(edges, source, path):
from itertools import chain
print(edges)
edges = dict(((u, v), 1 if c == "+" else -1) for u, v, c in edges.split())
vertices = set(chain(*edges))
path = f(vertices, edges, source, dest)
path_v, path_c = zip(*path)
print(" ".join("%2s" % v for v in path_v))
print(" ".join("%2d" % c for c in path_c))
source, dest = "AH"
edges = "AB+ BC+ CD+ DA+ AX+ XY+ YH+ HI- IJ- JK- KL- LM- MH-"
# uv+ means edge from u to v exists and has cost 1, uv- = cost -1
find_path(edges, source, path)
edges = "AB+ BC+ CD+ DE+ EF+ FG+ GA+ AX+ XY+ YH+ HI- IJ- JK- KL- LM- MH-"
find_path(edges, source, path)
edges = "AB+ BC+ CD+ DE+ EF+ FG+ GA+ AX+ XY+ YH+ HI- IJ- JK- KL- LM- MN- NH-"
find_path(edges, source, path)
edges = "AB+ BC+ CD+ DE+ EF+ FG+ GA+ AX+ XY+ YH+ HI- IJ- JK- KL- LM- MN- NO- OP- PH-"
find_path(edges, source, path)
I would like to clarify a few points :
The current assumptions are:
We may assume without loss of generality that the number of vertices is at most n. Recursively walk the graph and remember the cost values for each vertex. Stop if the cost was already remembered for the vertex, or if the cost would be negative.
After O(n) steps, either the destination has not been reached and there is no solution. Otherwise, for each of the O(n) vertices we have remembered at most O(n) different cost values, and for each of these O(n ^ 2) combinations there might have been up to n unsuccessful attempts to walk to other vertices. All in all, it's O(n ^ 3). q.e.d.
Update: Of course, there is something fishy again. What does assumption 3 mean: an O(n) length path exists if the problem has a solution? Any solution has to detect that, because it also has to report if there is no solution. But it's impossible to detect that, because that's not a property of the individual graph the algorithm works on (it is asymptotic behaviour).
(It is also clear that not all graphs for which the destination can be reached have a solution path of length O(n): Take a chain of m edges of weight -1, and before that a simple cycle of m edges and total weight +1).
[I now realize that most of the Python code from my other answer (attempt for the first version of the problem) can be reused.]