问题
For some reason I experience high load on my Cassandra nodes. Here are some information to get the picture.
When I create a brand new Cluster the load is constantly low for a couple of days and increases by time, after a week or something it just goes of into the air, causing what I found is a instability in the whole Cluster
I'm taking snapshots of one of my keyspaces containing around 300-400 MBs of data every 4 hour and deleting the ones older than 7 days, all configured in OpsCenter
The cluster is running on the striped disks in Microsoft Azure
The nodes are running on 2 cores with 3.5 GBs of RAM, I'm well aware that this is lower than the recommended hardware but this should not be the cause for the high load, I tried running on 4 cores with 7 GBs of RAM and saw no difference
I'm sure there's probably a whole box of things that could cause high load but I guess something is more likely than something else.
Edit
It appears that this high load is caused by the Repair Service in OpsCenter. There must be some settings to tweak how the repairing are runned by the service.
回答1:
You can configure the repair service by adding a [repair_service] section to your opscenterd.conf.
The main levers for tuning are:
max_parallel_repairs = 0
You can increase this until your repairs are completing fast enough that they are done within the time period you require (< gc_grace_seconds)
min_repair_time = 5
If you don't have that much data, the repair service may be completing too quickly and restarting -- causing unnecessary overhead. You can increase this value to ensure that you aren't running repair too frequently
snapshot_override
Again if you don't have too much data and the repair service completes too quickly, you will be generating too many snapshots (by default, repair service takes a snapshot before every repair). If your snapshot directory is getting full extremely quickly, you may want to turn this off until you tune the service to only run once (use raise min_repair_time drop parallel_repairs).
Note: The point of the repair service is to spread out the expensive/resource consuming process of repair into smaller jobs, this means that you may increase your overall cpu utilization by 5% or 10% at all times rather than having it spike and affect your workload during regular repair runs.
Details on advanced configuration
来源:https://stackoverflow.com/questions/28021344/high-load-on-cassandra-nodes