How to increase performance of shortest path using Gremlin?

余生颓废 提交于 2019-12-08 06:03:53

问题


I'm using JanusGraph with Gremlin and this dataset cotaining 2.6k nodes and 6.6k edges (3.3k edges on both sides). I've run the query for 10 minutes without find the shortest path.

Using Gephi the shortest path is almost instantaneous.

Here's my query:

g.V(687).repeat(out().simplePath()).until(hasId(1343)).path().limit(1)

回答1:


With simplePath() your query still processes a lot more paths than necessary. For example, if 688 is a direct neighbor of 687, but also a neighbor of 1000, which is 10 hops away on another path, why would you want to follow the path from 1000 to 688, if you've already seen this crossroad much earlier?

So, you should filter out any crossroad that you've seen before (the first occurrence is always the closest):

g.V(687).store('x').
  repeat(out().where(without('x')).aggregate('x')).
   until(hasId(1343)).limit(1).path()

Also note, that I swapped limit(1) and path; that's because it's a waste of resources (CPU and memory) to gather all paths first and then just take the first one.

UPDATE:

If others want to give it a try, here's the code to load the dataset into TinkerGraph:

g = TinkerGraph.open().traversal()
"http://nrvis.com/download/data/road/road-minnesota.zip".toURL().withInputStream {
  new java.util.zip.ZipInputStream(it).with {
    while (entry = it.getNextEntry()) {
      if ("road-minnesota.mtx" == entry.getName()) {
        it.eachLine {
          if (it ==~ /[0-9]+ [0-9]+/) {
            def (a, b) = it.split()*.toInteger()
            g.V(a).fold().
              coalesce(unfold(), addV().property(id, a)).
              addE("road").
                to(V(b).fold().coalesce(unfold(), addV().property(id, b))).inV().
              addE("road").to(V(a)).iterate()
          }
        }
        break
      }
      it.closeEntry()
    }
  }
}

And the query and a little benchmark:

gremlin> g.V(687).store('x').
......1>   repeat(out().where(without('x')).aggregate('x')).
......2>    until(hasId(1343)).limit(1).
......3>   path().by(id)
==>[687,689,686,677,676,675,673,626,610,606,607,608,735,732,733,730,729,734,737,738,739,742,786,816,840,829,815,825,865,895,872,874,968,983,1009,1044,1140,1142,1148,1219,1255,1329,1337,1339,1348,1343]

gremlin> clock (100) {
......1>   g.V(687).store('x').
......2>     repeat(out().where(without('x')).aggregate('x')).
......3>      until(hasId(1343)).limit(1).
......4>     path().iterate()
......5> }
==>12.5362714

12.5 ms on TinkerGraph looks pretty good to me. Expect it to run a little bit longer on JG, but surely not more than 10 minutes.



来源:https://stackoverflow.com/questions/50239642/how-to-increase-performance-of-shortest-path-using-gremlin

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!