I have a column family called Emails and i am saving mails into this CF, it is taking 100+seconds to write 5000 mails .
I am using i3 processor, 8gb ram . My data center has 6 nodes with replication factor = 2.
Does the size of the data what we store into the Cassandra affects the performance ? What are all the factors that affects write performance and how do i increase the performance ?
Thanks in advance..
Some of factors you are asking about are:
- connection speed and latency between the client and the cluster, and between machines in the cluster (as mentioned by @omnibear)
- replication factor you are using - if you insert emails one after another replication factor may affect the latency of the single operation, which will result in increased total time; I mean - you may consider batching write operations.
- you've written that you use i3/8gb - is it a configuration of the client or server machines? configuration of the server machines, especially the amount of memory and other processes that are running on them obviously may affect the performance
- commit log and data files location - it is recommended to place the commit log on a separate physical disk than data files
- compaction strategy - I bet it does not matter in your case, but in general it also affects the performance of writes; Cassandra firstly writes data to the memtable and commit log, then commit logs are flushed to sstables, and finally sstables are merged (which is called compaction); the parameters of this process can be tuned to improve performance in particular use cases; you may read about the write path in C* here
- you can also browse great DataStax documentation notes regarding performance: (http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_throughput_c.html), (http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html) and (http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_tune_jvm_c.html)
As an aside, maybe you should consider increasing replication factor to 3, because rf=2 will not give you much - if you use consistency level = quorum, and one node fails, you will not be able to use your cluster; if you decide to use rf=3 with cl=quorum, you still have to read/write to 2 nodes if you want to achieve strong consistency, but in addition, loosing a node will not make the cluster unavailable.
First use cassandra http://www.datastax.com/products/datastax-enterprise-visual-admin to find out time taken by Cassandra
You can also use
./nodetool cfstats
to collect the statistics on each keyspace and tables within.
It seems to me that your writer is slow as pointed out by others.
来源:https://stackoverflow.com/questions/22836529/how-to-increase-the-write-performance-in-cassandra