问题
I am relatively new to Accumulo, so would greatly appreciate general tips for doing this better.
I have a rowIds that are made up of a time component and a geographic component. I'd like to maintain statistics (counts, sums, etc.) in an iterator of some sort, but would like to emit mutations to other rows as part of the ingest. In other words, as I insert a row:
<timeA>_<geoX> colFam:colQual value
In addition to the mutation above, I'd like to maintain stats in separate rows in the same table (or a different one) as follows:
timeA_countRow colFam:colQual count++
geoX_countRow colFam:colQUal count++
timeA_sumRow colFam:colQUal sum += value
geoX_sumRow colFam:colQual sum += value
What is the best way to accomplish such a thing? I have definitely seen the stats combiner, but that works within a single row to my understanding. I'd like to maintain stats based on parts of the key...
Thanks!
回答1:
In addition to the mutation above, I'd like to maintain stats in separate rows in the same table (or a different one) as follows
This is something that fundamentally does not work with Accumulo. You cannot know, within the confines of an Iterator, about data in a separate row. That's why the StatsCombiner is written in the context of a single row. Any other row is not guaranteed to be contained in the Tablet (physical data boundary).
A common approach is to maintain this information client-side via a separate table or locality group with a SummingCombiner. When you insert an update for a specific column, you also submit an update to your stats table.
You could also look into Fluo which allows you to perform cross-row transactions. This is a different beast than normal Accumulo and is still in beta.
来源:https://stackoverflow.com/questions/36268152/maintain-statistics-across-rows-in-accumulo