问题
Is there a possibility to retrieve random rows from Cassandra (using it with Python/Pycassa)?
Update: With random rows I mean randomly selected rows!
回答1:
You might be able to do this by making a get_range
request with a random start
key (just a random string), and a row_count
of 1.
From memory, I think the finish
key would need to be the same as start
, so that the query 'wraps around' the keyspace; this would normally return all rows, but the row_count
will limit that.
Haven't tried it but this should ensure you get a single result without having to know exact row keys.
回答2:
Not sure what you mean by random rows. If you mean random access rows, then sure you can do it very easily:
import pycassa.pool
import pycassa.columnfamily
pool = pycassa.pool.ConnectionPool('keyspace', ['localhost:9160']
cf = pycassa.columnfamily.ColumnFamily(pool, 'cfname')
row = cf.get('row_key')
That will give you any row. If you mean that you want a randomly selected row, I don't think you'd be able to do that very easily without knowing what the keys are. You could generate an index row and then select a random column from that and use that to grab a row from another column family. Basically, you'd need to create a new row where each column value, was a row key from the column family from which you are trying to select a row. Then you could grab a column randomly from that row and you have the key to a random row.
I don't think pycassa offers any support to grab a random, non-indexed row.
回答3:
This works for my case:
ini = random.randint(0, 999999999)
rows = col_fam.get_range(str(ini), row_count=1, column_count=0,filter_empty=False)
You'll have to adapt to your row key type (string in my case)
来源:https://stackoverflow.com/questions/9566060/cassandra-pycassa-getting-random-rows