How to get top 5 records in cassandra 2.2

為{幸葍}努か 提交于 2019-12-06 05:44:45

问题


I need a help. I have a query which get top 5 records group by date (not date + time) and sum of amount.

I wrote the following but it returns all the records not just top 5 records

CREATE OR REPLACE FUNCTION state_groupbyandsum( state map<text, double>, datetime text, amount text )
CALLED ON NULL INPUT
RETURNS map<text, double>
LANGUAGE java 
AS 'String date = datetime.substring(0,10); Double count = (Double) state.get(date);  if (count == null) count = Double.parseDouble(amount); else count = count +  Double.parseDouble(amount); state.put(date, count); return state;' ;


CREATE OR REPLACE AGGREGATE groupbyandsum(text, text) 
SFUNC state_groupbyandsum
STYPE map<text, double>
INITCOND {};

select groupbyandsum(datetime, amout) from warehouse;

Could you please help out to get just 5 records.


回答1:


Here's one way to do that. Your group by state function could be like this:

CREATE FUNCTION state_group_and_total( state map<text, double>, type text, amount double )
CALLED ON NULL INPUT
RETURNS map<text, double>
LANGUAGE java AS '
     Double count = (Double) state.get(type);
     if (count == null)
         count = amount;
     else
         count = count + amount;
     state.put(type, count);
     return state;
';

That will build up a map of all the amount rows selected by your query WHERE clause. Now the tricky part is how to keep just the top N. One way to do it is by using a FINALFUNC which gets executed after all the rows have been put in the map. So here's a function to do that using a loop to find the maximum value in the map and move it to a result map. So to find the top N it would iterate over the map N times (there are more efficient algorithms than this, but it's just a quick and dirty example).

So here's an example to find the top two:

CREATE FUNCTION topFinal (state map<text, double>)
CALLED ON NULL INPUT
RETURNS map<text, double>
LANGUAGE java AS '
    java.util.Map<String, Double> inMap = new java.util.HashMap<String, Double>(),
                                  outMap = new java.util.HashMap<String, Double>();

    inMap.putAll(state);

    int topN = 2;
    for (int i = 1; i <= topN; i++) {
        double maxVal = -1;
        String moveKey = null;
        for (java.util.Map.Entry<String, Double> entry : inMap.entrySet()) {

            if (entry.getValue() > maxVal) {
                maxVal = entry.getValue();
                moveKey = entry.getKey();
            }
        }
        if (moveKey != null) {
            outMap.put(moveKey, maxVal);
            inMap.remove(moveKey);
        }
    }

    return outMap;
';

Then lastly you need to define the AGGREGATE to call the two functions you defined:

CREATE OR REPLACE AGGREGATE group_and_total(text, double) 
     SFUNC state_group_and_total 
     STYPE map<text, double> 
     FINALFUNC topFinal
     INITCOND {};

So let's see if that works.

CREATE table test (partition int, clustering text, amount double, PRIMARY KEY (partition, clustering));
INSERT INTO test (partition , clustering, amount) VALUES ( 1, '2015', 99.1);
INSERT INTO test (partition , clustering, amount) VALUES ( 1, '2016', 18.12);
INSERT INTO test (partition , clustering, amount) VALUES ( 1, '2017', 44.889);
SELECT * from test;

 partition | clustering | amount
-----------+------------+--------
         1 |       2015 |   99.1
         1 |       2016 |  18.12
         1 |       2017 | 44.889

Now, drum roll...

SELECT group_and_total(clustering, amount) from test where partition=1;

 agg.group_and_total(clustering, amount)
-------------------------------------------
            {'2015': 99.1, '2017': 44.889}

So you see it kept the top 2 rows based on the amount.

Note that the keys won't be in sorted order since it's a map, and I don't think we can control the key order in the map, so sorting in the FINALFUNC would be a waste of resources. If you need the map sorted then you could do that in the client.

I think you could do more work in the state_group_and_total function to drop items from the map as you go along. That might be better to keep the map from getting too big.



来源:https://stackoverflow.com/questions/31828952/how-to-get-top-5-records-in-cassandra-2-2

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!