问题
I am planning to use a Graph Database (AWS Neptune) that can be queried with Gremlin as a sort of Knowledge base. The KB would be used as a classification tool on with entities with multiple features. For simplicity, I am using geometric shapes to code the properties of my entities in this example. Let's suppose I want to classify Points that can be related to Squares, Triangles and Circles. I have blueprint the different possible relationships of Points with the possibles Squares, Triangles and Circles in a graph as depicted in the picture below.
Created with:
g.addV('Square').property(id, 'S_A')
.addV('Square').property(id, 'S_B')
.addV('Circle').property(id, 'C_A')
.addV('Triangle').property(id, 'T_A')
.addV('Triangle').property(id, 'T_B')
.addV('Point').property(id, 'P1')
.addV('Point').property(id, 'P2')
.addV('Point').property(id, 'P3')
g.V('P1').addE('Has_Triangle').to(g.V('T_B'))
g.V('P2').addE('Has_Triangle').to(g.V('T_A'))
g.V('P1').addE('Has_Square').to(g.V('S_A'))
g.V('P2').addE('Has_Square').to(g.V('S_A'))
g.V('P2').addE('Has_Circle').to(g.V('C_A'))
g.V('P3').addE('Has_Circle').to(g.V('C_A'))
g.V('P3').addE('Has_Square').to(g.V('S_B'))
The different entities are for example Points, Squares, Triangles, Circles.
So my ultimate goal is to find the Point that satisfies the highest number of conditions. E.g.
g.V().hasLabel('Point').where(and(
out('Has_Triangle').hasId('T_A'),
out('Has_Circle').hasId('C_A'),
out('Has_Square').hasId('S_A')
))
// ==>v[P2]
The query above works very well for classifying a Point (a) with properties (T_A,S_A,C_A)
respectively as a Point 2
(P2) type for example. But if I would have to use the same query for classifying a Point with properties (C_A,S_B,T_X)
for example:
g.V().hasLabel('Point').where(and(
out('Has_Triangle').hasId('T_X'),
out('Has_Circle').hasId('C_A'),
out('Has_Square').hasId('S_B')
))
The query would fail to classify this point as Point 3 (P3) as in the KB there is no known Triangle
property for P3
.
Is there a way I can express a query that returns the vertex with the highest match which in this case would be P3?
Thank you in advance.
EDIT
Best idea to solve this so far, is to put sentinel values for KB properties that do not exist. Then modify the query to match each exact property or the sentinel value. But this means that if I add a new "type" of property to a Point in the future e.g. a Point Has_Hexagon, than I need to add sentinel Hexagon to all Points of my graph.
EDIT 2
Added Gremlin script that creates sample data
回答1:
You can use the choose()
step to increment a counter (sack
) for each match, then order by counter values (descending) and pick the first one (highest match).
gremlin> g.withSack(0).V().hasLabel('Point').
choose(out('Has_Triangle').hasId('T_A'), sack(sum).by(constant(1))).
choose(out('Has_Circle').hasId('T_A'), sack(sum).by(constant(1))).
choose(out('Has_Square').hasId('T_A'), sack(sum).by(constant(1))).
order().
by(sack(), decr).
limit(1)
==>v[P2]
gremlin> g.withSack(0).V().hasLabel('Point').
choose(out('Has_Triangle').hasId('T_X'), sack(sum).by(constant(1))).
choose(out('Has_Circle').hasId('T_A'), sack(sum).by(constant(1))).
choose(out('Has_Square').hasId('S_B'), sack(sum).by(constant(1))).
order().
by(sack(), decr).
limit(1)
==>v[P3]
Each choose()
step in the queries above can be read as if (condition) increment-counter
. In any case, whether the condition is met or not, the original vertex (Point
) will be emitted by the choose
-step.
来源:https://stackoverflow.com/questions/56922227/gremlin-find-highest-match