We have been using Cassandra 0.6 and now have Column Families with millions of keys. We are interested in using the new Secondary Index feature available in the 0.7 but cou
Secondary indexes are stored as Column Families that are not accessible by the user. Their size will roughly be:
(cardinality of the set of indexed values * the avg size of the index values) + (the number of keys in the indexed column family * the avg size of keys in the column family).
Nodes only index rows that are stored locally -- that is, only rows for which they are a replica.