I am recording data on users searching for various keywords. What I\'d like to produce is a report of all of the unique keywords that the users have searched for, sorted in
According to the eBay tech blog, it's not unusual to store your counter values in the key itself. So to store the number of times, Bob, Ken, and Jimmy logged into a website, a single row would look as follows:
logins: [(0001_Bob,''), (0002_Bob, ''), ..., (0010_Ken, ''), (0012_Jimmy, ''), ...]
Notice that your keys will automatically sort themselves with the highest count at the tail-end and this is close to a constant time look-up.
Note that everytime your user logs-in, a new column key is created. You'd have to keep track of the number of log-ins in another row so that you have a fast look-up for how many log-ins have occurred so far and what integer value your next key should have:
login_count: [(Bob, 2), (Ken, 10), (Jimmy, 10), ...]
You could use each keyword as a row key, and use a counter column for each row to track the number of searches. You could then produce a report by scanning over every row and reading the counters. Cassandra won't sort the results (assuming you use the default RandomPartitioner rather than an OrderPreservingPartitioner), but given that there will presumably only be a few tens of thousands of keywords, you can easily sort them at the client.