问题
I am new to Accumulo. I know that I can write Java code to scan, insert, update and delete data using Hadoop and MapReduce. What I would like to know is whether aggregation is possible in Accumulo.
I know that in MySql we can use groupby
,orderby
,max
,min
,count
,sum
,join
s, nested queries, etc. Is their is any possibility to use these functions in Accumulo either directly or indirectly.
回答1:
Accumulo does support aggregation through the use of combiner iterators (Accumulo Combiner Example ).
Iterators mostly run server-side, but can be run client-side, and can perform quite a bit of computation before sending the data back to your client.
Accumulo comes packaged with many iterators, more specifically the summingCombiner is used to sum the values of entries. Dave Medinet's has a blog that has some good examples (Accumulo Blog). More specifically, using the summingCombiner to implement wordcount (Word Count in Accumulo). I also suggest signing up for the Accumulo users mailing lists (mailing lists).
回答2:
I like to think Accumulo has great agg functionality. I run an OLAP solution on it with hundreds of millions of keys on 40 nodes. In addition to the basic SummingCombiner, I recommend the newer statscombiner as well
http://accumulo.apache.org/1.4/apidocs/org/apache/accumulo/examples/simple/combiner/StatsCombiner.html
which gives you basic stats about a set of keys.
You can set combiners at maj compaction, minor compaction or scan time. If you have a ton of data with a lot of trickled keys, I don't recommend scan time combining, because it can slow down the scan time (not always).
HTH
回答3:
Some aggregation is supported in Accumulo, over multiple entries, and even multiple rows, within each tablet. Aggregation across tablets would need to be done on the client side or in a MapReduce job.
回答4:
Yes, Aggregations are possible in Accumulo. you can achieve them by -
1) Using in built Combiners which aggregate data when you ingest.
2) Make Customised Aggregation Iterator and then deploy it at minor or majour compactions.
来源:https://stackoverflow.com/questions/19048613/does-accumulo-support-aggregation