Due to massive load increases on our website redis is now struggling with peak load because the redis server instance is reaching 100% CPU (on one of eight cores) resulting
My first, simple suggestion if you haven't done it already would be to turn off all RDB or AOF backups on your Master at the very least. Of course then your slaves might fall behind if they're still saving to disk. See this for an idea of the cost of RDB dumps
Another thing to do is to make sure you're pipelining all of your commands. If you're sending many commands individually that can be grouped into a pipeline you should see a bump in performance.
Also, this SO post has a good answer about profiling Redis
More info about your use case, and data structure would be helpful in deciding whether there's a simple change you could make to the way you're actually using Redis that would give you an improvement.
Edit: In response to your latest comment, it's good to note that every time you have a slave lose connection and reconnect, it will re-sync with the master. In previous versions of Redis this was always a complete re-sync, so it was quite expensive. Apparently in 2.8 the slave is now able to request a partial re-sync of just the data it's missed since it's disconnection. I don't know much about the details, but if either your master or any of your slaves aren't on 2.8.* and you have a shaky connection, that could really hurt your cpu performance by constantly forcing your master to re-sync the slaves. More info Here
We found an issue inside our application. Communication about updated data in our cache to the local memory cache was realized through a redis channel subscription.
Every time local cache was flushed, items expired or items were updated messages got sent to all (35) webservers wich in turn started updating more items, etc, etc.
Disabling the messages for the updated keys improved our situation by 10 fold.
Network bandwidth dropped from 1.2 Gbps to 200Mbps and CPU utilization is 40% at 150% the load we had so far at a moment of extreme calculations and updates.
The first thing to do would be to look at slowlog get 50
(or pick any number of rows) - this shows the last 50
commands that took non-trivial amounts of time. It could be that some of the things you are doing are simply taking too long. I get worried if I see anything in slowlog
- I usually see items every few days. If you are seeing lots of items constantly, then: you need to investigate what you are actually doing on the server. One killer thing to never do is keys
, but there are other things.
The next thing to do is: cache. Requests that get short-circuited before they hit the back end are free. We use redis extensively, but that doesn't mean we ignore local memory too.