gremlin - query optimization - property value counts for multiple interval ranges

早过忘川 提交于 2021-01-28 13:41:28

问题


Given a vertex, a property, and pre-defined interval ranges [(0,100), (100,500), (500,1000), (1000, 5000), ...], I want to compute the vertex's edge count for each interval for where an edge's property value falls.

For example, the vertex 446656 has 5 edges, which each have a property trxn_amt with the following values: [92, 380, 230, 899, 102]. This would give group counts {(0,100): 1, (100,500): 3, (500,1000):1, (1000, 5000):0, ...}.

My question is split into two parts.

Firstly, is there a cleaner implementation than the following project query?

g.V(446656).project('num_trxn_0_100', 'num_trxn_100_500')
    .by(bothE().where(values('trxn_amt').is(between(0.0, 100.0))).count())
    .by(bothE().where(values('trxn_amt').is(between(100.0, 500.0))).count())

==>{num_trxn_0_100=1, num_trxn_100_500=3}

^ Imagine more intervals

Secondly, how can I include an edge filter which isn't computed multiple times?

I want to add in a date filter (i.e. bothE() -> bothE().has('trxn_dt_int', lt(999999999999)), and don't want to compute this filter multiple times for each .by(...) step. Is there a way to compute this filter a single time, store it, and use it later - or alternatively, if I do include it multiple times, is there any optimization that happens under the hood to make sure it's only computed a single time?


回答1:


Firstly, is there a cleaner implementation than the following project query?

I think you realized the issue with that approach which is why you are asking the question - you traverse bothE() multiple times to get your answer. And I think that ties into your second question of:

Secondly, how can I include an edge filter which isn't computed multiple times?

I think that you can better write this query with groupCount(). To demonstrate I've used the Grateful Dead graph:

gremlin> g = TinkerFactory.createGratefulDead().traversal()
==>graphtraversalsource[tinkergraph[vertices:808 edges:8049], standard]
gremlin> g.V(3).
......1>   bothE('followedBy').
......2>   groupCount().
......3>     by(choose(values('weight')).
......4>          option(between(0, 24), constant('small')).
......5>          option(between(25, 99), constant('medium')).
......6>          option(gte(100), constant('big')))
==>[small:140,big:2,medium:7]

Now just add your date filter for the edges prior to groupCount() and it only has to happen once.



来源:https://stackoverflow.com/questions/61824401/gremlin-query-optimization-property-value-counts-for-multiple-interval-range

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!