I\'m looking at implementing a CF in Cassandra that has very long rows (hundreds of thousands to millions of columns per row).
Using entirely dummy data, I\'ve inserted
A good resource on this is Aaron Morton's blog post on Cassandra's Reversed Comparators. From the article:
Recall from my post on Cassandra Query Plans that once rows get to a certain size they include an index of the columns. And that the entire index must be read whenever any part of the index needs to be used, which is the case when using a Slice Range that specifies start or reversed. So the fastest slice query to run against a row was one that retrieved the first X columns in a row by only specifying a column count.
If you are mostly reading from the end of a row (for example if you are storing things by timestamp and you mostly want to look at recent data) you can use the Reversed Comparator
which stores you columns in descending order. This will give you much better (and more consistent) query performance.
If your read patterns are more random you might be better off partitioning your data across multiple rows.