问题
I use Microsoft.Azure.Graphs library to connect to a Cosmos DB instance and query the graph database.
I'm trying to optimize my Gremlin queries in order to only select those properties that I only require. However, I don't know how to choose which properties to select from edges and vertices.
Let's say we start from this query:
gremlin> g.V().hasLabel('user').
project('user', 'edges', 'relatedVertices')
.by()
.by(bothE().fold())
.by(both().fold())
This will return something along the lines of:
{
"user": {
"id": "<userId>",
"type": "vertex",
"label": "user",
"properties": [
// all vertex properties
]
},
"edges": [{
"id": "<edgeId>",
"type": "edge",
"label": "<edgeName>",
"inV": <relatedVertexId>,
"inVLabel": "<relatedVertexLabel>",
"outV": "<relatedVertexId>",
"outVLabel": "<relatedVertexLabel>"
"properties": [
// edge properties, if any
]
}],
"relatedVertices": [{
"id": "<vertexId>",
"type": "vertex",
"label": "<relatedVertexLabel>",
"properties": [
// all related vertex properties
]
}]
}
Now let's say we only take a couple of properties from the root vertex which we named "User":
gremlin> g.V().hasLabel('user').
project('id', 'prop1', 'prop2', 'edges', 'relatedVertices')
.by(id)
.by('prop1')
.by('prop2')
.by(bothE().fold())
.by(both().fold())
Which will make some progress for us and yield something along the lines of:
{
"id": "<userId>",
"prop1": "value1",
"prop2": "value2",
"edges": [{
"id": "<edgeId>",
"type": "edge",
"label": "<edgeName>",
"inV": <relatedVertexId>,
"inVLabel": "<relatedVertexLabel>",
"outV": "<relatedVertexId>",
"outVLabel": "<relatedVertexLabel>"
"properties": [
// edge properties, if any
]
}],
"relatedVertices": [{
"id": "<vertexId>",
"type": "vertex",
"label": "<relatedVertexLabel>",
"properties": [
// all related vertex properties
]
}]
}
Now is it possible to do something similar to edges and related vertices? Say, something along the lines of:
gremlin> g.V().hasLabel('user').
project('id', 'prop1', 'prop2', 'edges', 'relatedVertices')
.by(id)
.by('prop1')
.by('prop2')
.by(bothE().fold()
.project('edgeId', 'edgeLabel', 'edgeInV', 'edgeOutV')
.by(id)
.by(label)
.by(inV)
.by(outV))
.by(both().fold()
.project('vertexId', 'someProp1', 'someProp2')
.by(id)
.by('someProp1')
.by('someProp2'))
My aim is to get an output like this:
{
"id": "<userId>",
"prop1": "value1",
"prop2": "value2",
"edges": [{
"edgeId": "<edgeId>",
"edgeLabel": "<edgeName>",
"edgeInV": <relatedVertexId>,
"edgeOutV": "<relatedVertexId>"
}],
"relatedVertices": [{
"vertexId": "<vertexId>",
"someProp1": "someValue1",
"someProp2": "someValue2"
}]
}
回答1:
You were pretty close:
gremlin> g.V().hasLabel('person').
......1> project('name','age','edges','relatedVertices').
......2> by('name').
......3> by('age').
......4> by(bothE().
......5> project('id','inV','outV').
......6> by(id).
......7> by(inV().id()).
......8> by(outV().id()).
......9> fold()).
.....10> by(both().
.....11> project('id','name').
.....12> by(id).
.....13> by('name').
.....14> fold())
==>[name:marko,age:29,edges:[[id:9,inV:3,outV:1],[id:7,inV:2,outV:1],[id:8,inV:4,outV:1]],relatedVertices:[[id:3,name:lop],[id:2,name:vadas],[id:4,name:josh]]]
==>[name:vadas,age:27,edges:[[id:7,inV:2,outV:1]],relatedVertices:[[id:1,name:marko]]]
==>[name:josh,age:32,edges:[[id:10,inV:5,outV:4],[id:11,inV:3,outV:4],[id:8,inV:4,outV:1]],relatedVertices:[[id:5,name:ripple],[id:3,name:lop],[id:1,name:marko]]]
==>[name:peter,age:35,edges:[[id:12,inV:3,outV:6]],relatedVertices:[[id:3,name:lop]]]
Two points you should consider when writing Gremlin:
- The output of the previous step feeds into the input of the following step and if you don't clearly see what's coming out of a particular step, then the steps that follow may not end up being right. In your example, in the first
by()
you added theproject()
after thefold()
which was basically saying "Hey, Gremlin, project thatList
of edges for me." But in theby()
modulators forproject()
you treated the input to project not as aList
but as individual edges which likely led to an error. In Java, that error is: "java.util.ArrayList cannot be cast to org.apache.tinkerpop.gremlin.structure.Element". An error like that is a clue that somewhere in your Gremlin you are not properly following the outputs and inputs of your steps. fold()
takes all the elements in the stream of the traversal and converts them to aList
. So where you had many objects, you will now have one after thefold()
. To process them as a stream again, you would need tounfold()
them for steps to operate on them individually. In this case, we just needed to move thefold()
to the end of the statement after doing the sub-project()
for each edge/vertex. But why do we needfold()
at all? The answer is that the traversal passed to theby()
modulator is not iterated completely by the step that it modifies (in this caseproject()
). The step only callsnext()
to get the first element in the stream - this is by design. Therefore, in cases where you want the entire stream of aby()
to be processed you must reduce the stream to a single object. You might do that withfold()
, but other examples includesum()
,count()
,mean()
, etc.
来源:https://stackoverflow.com/questions/48277312/gremlin-on-azure-cosmosdb-how-to-project-the-related-vertices-properties