Maintain statistics across rows in accumulo

二次信任 提交于 2019-12-12 03:36:37

问题


I am relatively new to Accumulo, so would greatly appreciate general tips for doing this better.

I have a rowIds that are made up of a time component and a geographic component. I'd like to maintain statistics (counts, sums, etc.) in an iterator of some sort, but would like to emit mutations to other rows as part of the ingest. In other words, as I insert a row:

<timeA>_<geoX> colFam:colQual value

In addition to the mutation above, I'd like to maintain stats in separate rows in the same table (or a different one) as follows:

timeA_countRow colFam:colQual count++
geoX_countRow colFam:colQUal count++
timeA_sumRow colFam:colQUal sum += value
geoX_sumRow colFam:colQual sum += value

What is the best way to accomplish such a thing? I have definitely seen the stats combiner, but that works within a single row to my understanding. I'd like to maintain stats based on parts of the key...

Thanks!


回答1:


In addition to the mutation above, I'd like to maintain stats in separate rows in the same table (or a different one) as follows

This is something that fundamentally does not work with Accumulo. You cannot know, within the confines of an Iterator, about data in a separate row. That's why the StatsCombiner is written in the context of a single row. Any other row is not guaranteed to be contained in the Tablet (physical data boundary).

A common approach is to maintain this information client-side via a separate table or locality group with a SummingCombiner. When you insert an update for a specific column, you also submit an update to your stats table.

You could also look into Fluo which allows you to perform cross-row transactions. This is a different beast than normal Accumulo and is still in beta.



来源:https://stackoverflow.com/questions/36268152/maintain-statistics-across-rows-in-accumulo

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!