I would like to rank items according to a given users preference (items liked by the user) based on a random walk on a directed bipartite graph using gremlin in groovy.
The graph has the following basic structure:
[User1] ---'likes'---> [ItemA] <---'likes'--- [User2] ---'likes'---> [ItemB]
Hereafter the query that I came up with:
def runRankQuery(def userVertex) {
def m = [:]
def c = 0
while (c < 1000) {
userVertex
.out('likes') // get all liked items of current or similar user
.shuffle[0] // select randomly one liked item
.groupCount(m) // update counts for selected item
.in('likes') // get all users who also liked item
.shuffle[0] // select randomly one user that liked item
.loop(5){Math.random() < 0.5} // follow liked edge of new user (feed new user in loop)
// OR abort query (restart from original user, outer loop)
.iterate()
c++
}
m = m.sort {a, b -> b.value <=> a.value}
println "intermediate result $m"
m.keySet().removeAll(userVertex.out('likes').toList())
// EDIT (makes no sense - remove): m.each{k,v -> m[k] = v / m.values().sum()}
// EDIT (makes no sense - remove): m.sort {-it.value }
return m.keySet() as List;
}
However this code does not find new items ([ItemB] in example above) but only the liked items of the given user (e.g. [ItemA]).
What do I need to change to feed a new user (e.g. [User2]) with the loop step back to the 'out('likes')' step in order to continue the walk?
Once this code is working, can it be seen as an implementation of 'Personalized PageRank'?
Here the code to run the example:
g = new TinkerGraph()
user1 = g.addVertex()
user1.name ='User1'
user2 = g.addVertex()
user2.name ='User2'
itemA = g.addVertex()
itemA.name ='ItemA'
itemB = g.addVertex()
itemB.name ='ItemB'
g.addEdge(user1, itemA, 'likes')
g.addEdge(user2, itemA, 'likes')
g.addEdge(user2, itemB, 'likes')
println runRankQuery(user1)
And the output:
intermediate result [v[2]:1000]
[]
==>null
gremlin> g.v(2).name
==>ItemA
gremlin>
I found this to be a really strange issue. I found several very strange problems which aren't easily explainable and in the end, I'm not sure why they are the way they are. The two big things that are strange to me are:
- I'm not sure if there is a problem with the
shuffle
step. It does not seem to randomize properly in your case here. I can't seem to recreate the problem outside of this case, so I'm not sure if it's somehow related to the size of your data or something else. - I hit strange problems with use of
Math.random()
to break out of the loop.
Anyway, I think I've captured the essence of your code here with my changes that seem to do what you want:
runRankQuery = { userVertex ->
def m = [:]
def c = 0
def rand = new java.util.Random()
while (c < 1000) {
def max = rand.nextInt(10) + 1
userVertex._().as('x')
.out('likes')
.gather.transform{it[rand.nextInt(it.size())]}
.groupCount(m)
.in('likes')
.gather.transform{it[rand.nextInt(it.size())]}
.loop('x'){it.loops < max}
.iterate()
c++
}
println "intermediate result $m"
m.keySet().removeAll(userVertex.out('likes').toList())
m.each{k,v -> m[k] = v / m.values().sum()}
m.sort {-it.value }
return m.keySet() as List;
}
I replaced shuffle
with my own brand of "shuffle" by randomly selecting a single vertex from the gathered list. I also randomly selected a max
loops rather than relying on Math.random()
. When I run this now, I think I get the results you are looking for:
gremlin> runRankQuery(user1)
intermediate result [v[2]:1787, v[3]:326]
==>v[3]
gremlin> runRankQuery(user1)
intermediate result [v[2]:1848, v[3]:330]
==>v[3]
gremlin> runRankQuery(user1)
intermediate result [v[2]:1899, v[3]:339]
==>v[3]
gremlin> runRankQuery(user1)
intermediate result [v[2]:1852, v[3]:360]
==>v[3]
You might yet get Math.random()
to work as it did behave predictably for me on some iterations of working with this.
来源:https://stackoverflow.com/questions/24783212/random-walk-on-bipartite-graph-with-gremlin