Is there a possibility to retrieve random rows from Cassandra (using it with Python/Pycassa)?
Update: With random rows I mean randomly selected rows!
You might be able to do this by making a get_range
request with a random start
key (just a random string), and a row_count
of 1.
From memory, I think the finish
key would need to be the same as start
, so that the query 'wraps around' the keyspace; this would normally return all rows, but the row_count
will limit that.
Haven't tried it but this should ensure you get a single result without having to know exact row keys.
Not sure what you mean by random rows. If you mean random access rows, then sure you can do it very easily:
import pycassa.pool
import pycassa.columnfamily
pool = pycassa.pool.ConnectionPool('keyspace', ['localhost:9160']
cf = pycassa.columnfamily.ColumnFamily(pool, 'cfname')
row = cf.get('row_key')
That will give you any row. If you mean that you want a randomly selected row, I don't think you'd be able to do that very easily without knowing what the keys are. You could generate an index row and then select a random column from that and use that to grab a row from another column family. Basically, you'd need to create a new row where each column value, was a row key from the column family from which you are trying to select a row. Then you could grab a column randomly from that row and you have the key to a random row.
I don't think pycassa offers any support to grab a random, non-indexed row.
This works for my case:
ini = random.randint(0, 999999999)
rows = col_fam.get_range(str(ini), row_count=1, column_count=0,filter_empty=False)
You'll have to adapt to your row key type (string in my case)
来源:https://stackoverflow.com/questions/9566060/cassandra-pycassa-getting-random-rows