问题
I need a help. I have a query which get top 5 records group by date (not date + time) and sum of amount.
I wrote the following but it returns all the records not just top 5 records
CREATE OR REPLACE FUNCTION state_groupbyandsum( state map<text, double>, datetime text, amount text )
CALLED ON NULL INPUT
RETURNS map<text, double>
LANGUAGE java
AS 'String date = datetime.substring(0,10); Double count = (Double) state.get(date); if (count == null) count = Double.parseDouble(amount); else count = count + Double.parseDouble(amount); state.put(date, count); return state;' ;
CREATE OR REPLACE AGGREGATE groupbyandsum(text, text)
SFUNC state_groupbyandsum
STYPE map<text, double>
INITCOND {};
select groupbyandsum(datetime, amout) from warehouse;
Could you please help out to get just 5 records.
回答1:
Here's one way to do that. Your group by state function could be like this:
CREATE FUNCTION state_group_and_total( state map<text, double>, type text, amount double )
CALLED ON NULL INPUT
RETURNS map<text, double>
LANGUAGE java AS '
Double count = (Double) state.get(type);
if (count == null)
count = amount;
else
count = count + amount;
state.put(type, count);
return state;
';
That will build up a map of all the amount rows selected by your query WHERE clause. Now the tricky part is how to keep just the top N. One way to do it is by using a FINALFUNC which gets executed after all the rows have been put in the map. So here's a function to do that using a loop to find the maximum value in the map and move it to a result map. So to find the top N it would iterate over the map N times (there are more efficient algorithms than this, but it's just a quick and dirty example).
So here's an example to find the top two:
CREATE FUNCTION topFinal (state map<text, double>)
CALLED ON NULL INPUT
RETURNS map<text, double>
LANGUAGE java AS '
java.util.Map<String, Double> inMap = new java.util.HashMap<String, Double>(),
outMap = new java.util.HashMap<String, Double>();
inMap.putAll(state);
int topN = 2;
for (int i = 1; i <= topN; i++) {
double maxVal = -1;
String moveKey = null;
for (java.util.Map.Entry<String, Double> entry : inMap.entrySet()) {
if (entry.getValue() > maxVal) {
maxVal = entry.getValue();
moveKey = entry.getKey();
}
}
if (moveKey != null) {
outMap.put(moveKey, maxVal);
inMap.remove(moveKey);
}
}
return outMap;
';
Then lastly you need to define the AGGREGATE to call the two functions you defined:
CREATE OR REPLACE AGGREGATE group_and_total(text, double)
SFUNC state_group_and_total
STYPE map<text, double>
FINALFUNC topFinal
INITCOND {};
So let's see if that works.
CREATE table test (partition int, clustering text, amount double, PRIMARY KEY (partition, clustering));
INSERT INTO test (partition , clustering, amount) VALUES ( 1, '2015', 99.1);
INSERT INTO test (partition , clustering, amount) VALUES ( 1, '2016', 18.12);
INSERT INTO test (partition , clustering, amount) VALUES ( 1, '2017', 44.889);
SELECT * from test;
partition | clustering | amount
-----------+------------+--------
1 | 2015 | 99.1
1 | 2016 | 18.12
1 | 2017 | 44.889
Now, drum roll...
SELECT group_and_total(clustering, amount) from test where partition=1;
agg.group_and_total(clustering, amount)
-------------------------------------------
{'2015': 99.1, '2017': 44.889}
So you see it kept the top 2 rows based on the amount.
Note that the keys won't be in sorted order since it's a map, and I don't think we can control the key order in the map, so sorting in the FINALFUNC would be a waste of resources. If you need the map sorted then you could do that in the client.
I think you could do more work in the state_group_and_total function to drop items from the map as you go along. That might be better to keep the map from getting too big.
来源:https://stackoverflow.com/questions/31828952/how-to-get-top-5-records-in-cassandra-2-2