How to increase performance of shortest path using Gremlin?

后端 未结 1 1396
隐瞒了意图╮
隐瞒了意图╮ 2021-01-15 20:13

I\'m using JanusGraph with Gremlin and this dataset cotaining 2.6k nodes and 6.6k edges (3.3k edges on both sides). I\'ve run the query for 10 minutes without find the short

1条回答
  •  天涯浪人
    2021-01-15 20:55

    With simplePath() your query still processes a lot more paths than necessary. For example, if 688 is a direct neighbor of 687, but also a neighbor of 1000, which is 10 hops away on another path, why would you want to follow the path from 1000 to 688, if you've already seen this crossroad much earlier?

    So, you should filter out any crossroad that you've seen before (the first occurrence is always the closest):

    g.V(687).store('x').
      repeat(out().where(without('x')).aggregate('x')).
       until(hasId(1343)).limit(1).path()
    

    Also note, that I swapped limit(1) and path; that's because it's a waste of resources (CPU and memory) to gather all paths first and then just take the first one.

    UPDATE:

    If others want to give it a try, here's the code to load the dataset into TinkerGraph:

    g = TinkerGraph.open().traversal()
    "http://nrvis.com/download/data/road/road-minnesota.zip".toURL().withInputStream {
      new java.util.zip.ZipInputStream(it).with {
        while (entry = it.getNextEntry()) {
          if ("road-minnesota.mtx" == entry.getName()) {
            it.eachLine {
              if (it ==~ /[0-9]+ [0-9]+/) {
                def (a, b) = it.split()*.toInteger()
                g.V(a).fold().
                  coalesce(unfold(), addV().property(id, a)).
                  addE("road").
                    to(V(b).fold().coalesce(unfold(), addV().property(id, b))).inV().
                  addE("road").to(V(a)).iterate()
              }
            }
            break
          }
          it.closeEntry()
        }
      }
    }
    

    And the query and a little benchmark:

    gremlin> g.V(687).store('x').
    ......1>   repeat(out().where(without('x')).aggregate('x')).
    ......2>    until(hasId(1343)).limit(1).
    ......3>   path().by(id)
    ==>[687,689,686,677,676,675,673,626,610,606,607,608,735,732,733,730,729,734,737,738,739,742,786,816,840,829,815,825,865,895,872,874,968,983,1009,1044,1140,1142,1148,1219,1255,1329,1337,1339,1348,1343]
    
    gremlin> clock (100) {
    ......1>   g.V(687).store('x').
    ......2>     repeat(out().where(without('x')).aggregate('x')).
    ......3>      until(hasId(1343)).limit(1).
    ......4>     path().iterate()
    ......5> }
    ==>12.5362714
    

    12.5 ms on TinkerGraph looks pretty good to me. Expect it to run a little bit longer on JG, but surely not more than 10 minutes.

    0 讨论(0)
提交回复
热议问题