What is the difference between scylla read path and cassandra read path?

╄→尐↘猪︶ㄣ 提交于 2020-01-23 12:15:56

问题


What is the difference between Scylla read path and Cassandra read path? When I stress Cassandra and Scylla then Scylla read performance poor by 5 times than Cassandra using 16 core and normal HDD.

I expect better read performance on Scylla compared to Cassandra using normal HDD, because my company doesn't provide SSD's.

Can someone please confirm, is it possible to achieve better read performance using normal HDD or not?

If yes, what changes required scylla config?. Please guide me!


回答1:


There can be various reasons why you are not getting the most out of your Scylla Cluster.

  1. Number of concurrent connections from your clients/loaders is not high enough, or you're not using sufficient amount of loaders. In such case, some shards will be doing all the work, while others will be mostly idle. You want to keep your parallelism high.

  2. Scylla likes have a minimum of 2 connections per shard (you can see the number of shards in /etc/scylla.d/cpuset.conf)

  3. What's the size of your dataset? Are you reading a large amount of partitions or just a few? You might be hitting a hot partition situation

I strongly recommend reading the following docs that will provide you more insights:

  • https://www.scylladb.com/2019/03/27/best-practices-for-scylla-applications/

  • https://docs.scylladb.com/operating-scylla/benchmarking-scylla/




回答2:


@Sateesh, I want to add to the answer by @TomerSan that both Cassandra and ScyllaDB utilize the same disk storage architecture (LSM). That means that they have relatively the same disk access patterns because the algorithms are largely the same. The LSM trees were built with the idea in mind that it is not necessary to do instant in-place updates. It consists of immutable data buckets that are large continuous pieces of data on disk. That means less random IO, more sequential IO for which the HDD works great (not counting utilized parallelism by modern database implementations).

All the above means that the difference that you see, is not induced by the difference in how those databases use a disk. It must be related to the configuration differences and what happens underneath. Maybe ScyllaDB tries to utilize more parallelism or more aggressively do compaction. It depends.

In order to be able to say anything specific, please share your tests, envs, and configurations.




回答3:


Both databases use LSM tree but Scylla has thread-per-core architecture on top plus we use O_Direct while C* uses the page cache. Scylla also has a sophisticated IO scheduler that makes sure not to overload the disk and thus scylla_setup runs a benchmark automatically to tune. Check your output of it in io.conf.

There are far more things to review, better to send your data to the mailing list. In general, Scylla should perform better in this case as well but your disk is likely to be the bottleneck in both cases.




回答4:


As a summary I would say Scylladb and cassandra have the same read / write path memtable, commitlog, sstable.

However implementation is very different: - cassandra rely on OS for low level IO and network (most DBMS does) - scylladb rely on its own lib (seastar) to handle IO and network at a low level independently from OS page cache etc. This is why they can provide feature such as workload scheduling within the same cluster that would be very hard to implement in cassandra.



来源:https://stackoverflow.com/questions/59677972/what-is-the-difference-between-scylla-read-path-and-cassandra-read-path

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!