How to increase the write performance in cassandra?

孤街醉人 提交于 2019-12-06 13:43:55
Jacek L.

Some of factors you are asking about are:

  • connection speed and latency between the client and the cluster, and between machines in the cluster (as mentioned by @omnibear)
  • replication factor you are using - if you insert emails one after another replication factor may affect the latency of the single operation, which will result in increased total time; I mean - you may consider batching write operations.
  • you've written that you use i3/8gb - is it a configuration of the client or server machines? configuration of the server machines, especially the amount of memory and other processes that are running on them obviously may affect the performance
  • commit log and data files location - it is recommended to place the commit log on a separate physical disk than data files
  • compaction strategy - I bet it does not matter in your case, but in general it also affects the performance of writes; Cassandra firstly writes data to the memtable and commit log, then commit logs are flushed to sstables, and finally sstables are merged (which is called compaction); the parameters of this process can be tuned to improve performance in particular use cases; you may read about the write path in C* here
  • you can also browse great DataStax documentation notes regarding performance: (http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_throughput_c.html), (http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html) and (http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_tune_jvm_c.html)

As an aside, maybe you should consider increasing replication factor to 3, because rf=2 will not give you much - if you use consistency level = quorum, and one node fails, you will not be able to use your cluster; if you decide to use rf=3 with cl=quorum, you still have to read/write to 2 nodes if you want to achieve strong consistency, but in addition, loosing a node will not make the cluster unavailable.

First use cassandra http://www.datastax.com/products/datastax-enterprise-visual-admin to find out time taken by Cassandra

You can also use

./nodetool cfstats

to collect the statistics on each keyspace and tables within.

It seems to me that your writer is slow as pointed out by others.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!