发表新帖

发表新帖

Cassandra performance for long rows

前端未结

关注

 2  1780

情深已故 2021-02-02 03:17

I\'m looking at implementing a CF in Cassandra that has very long rows (hundreds of thousands to millions of columns per row).

Using entirely dummy data, I\'ve inserted

2条回答

一生所求 (楼主)

2021-02-02 04:02

psanford's comment led me to the answer. It turns out that Cassandra <1.1.0 (currently in beta) has slow performance on slices on long rows in Memtables (that have not been flushed to disk) but better performance on SSTables flushed to disk with the same data.

see http://mail-archives.apache.org/mod_mbox/cassandra-user/201201.mbox/%3CCAA_K6YvZ=vd=Bjk6BaEg41_r1gfjFaa63uNSXQKxgeB-oq2e5A@mail.gmail.com%3E and https://issues.apache.org/jira/browse/CASSANDRA-3545.

With my example, the first 1.8 million rows had been flushed to disk, so slices over that range were fast, but the last ~200,000 rows hadn't been flushed to disk and were still in memtables. As the memtables slicing is slow on long rows, this is why I saw bad performance at the end of the rows (my data was inserted in column order).

This can be fixed by manually calling a flush on the cassandra nodes. A patch has been applied to 1.1.0 to fix this and I can confirm that this fixes the issue for me.

I hope this helps anyone else with the same problem.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题