High CPU, possibly due to context switching?

后端未结

关注

 6  1524

One of our servers is experiencing a very high CPU load with our application. We\'ve looked at various stats and are having issues finding the source of the problem.

One

相关标签:

6条回答

無奈伤痛

2021-02-09 06:42
I think your constraints are unreasonable. Basically what you are saying is:
```
1.I can't change anything
2.I can't measure anything
```
Can you please speculate as to what my problem might be?

The real answer to this is that you need to hook a proper profiler to the application and you need to correlate what you see with CPU usage, Disk/Network I/O, and memory.

Remember the 80/20 rule of performance tuning. 80% will come from tuning your application. You might just have too much load for one VM instance and it could be time to consider solutions for scaling horizontally or vertically by giving more resources to the machine. It could be any one of the 3 billion JVM settings are not inline with your application's execution specifics.

I assume the 3000 thread pool came from the famous more threads = more concurrency = more performance theory. The real answer is a tuning change isn't worth anything unless you measure throughput and response time before/after the change and compared the results.
0 讨论(0)
发布评论:

提交评论
- 加载中...
醉酒成梦

2021-02-09 06:42
If you can't profile, I'd recommend taking a thread dump or two and seeing what your threads are doing. Your app doesn't have to stop to do it:
1. http://docs.oracle.com/javase/6/docs/technotes/guides/visualvm/threads.html
2. http://java.net/projects/tda/
3. http://java.sys-con.com/node/1611555
0 讨论(0)
发布评论:

提交评论
- 加载中...
醉话见心

2021-02-09 06:56

Seems to me the problem is 100 CPU bound threads more than anything else. 3000 thread pool is basically a red herring, as idle threads don't consume much of anything. The I/O threads are likely sleeping "most" of the time, since I/O is measured on a geologic time scale in terms of computer operations.

You don't mention what the 100 CPU threads are doing, or how long they last, but if you want to slow down a computer, dedicating 100 threads of "run until time slice says stop" will most certainly do it. Because you have 100 "always ready to run", the machine will context switch as fast as the scheduler allows. There will be pretty much zero idle time. Context switching will have impact because you're doing it so often. Since the CPU threads are (likely) consuming most of the CPU time, your I/O "bound" threads are going to be waiting in the run queue longer than they're waiting for I/O. So, even more processes are waiting (the I/O processes just bail out more often as they hit an I/O barrier quickly which idles the process out for the next one).

No doubt there are tweaks here and there to improve efficiency, but 100 CPU threads are 100 CPU threads. Not much you can do there.

0 讨论(0)
发布评论:

提交评论
- 加载中...
情深已故

2021-02-09 06:56

Usually, context switching in threads is very cheap computationally, but when it involves this many threads... you just can't know. You say upgrading to Java 1.6 EE is out of the question, but what about some hardware upgrades ? It would probably provide a quick fix and shouldn't be that expensive...

0 讨论(0)
发布评论:

提交评论
- 加载中...
佛祖请我去吃肉

2021-02-09 06:58
e.g. run a profiler on a similar machine.
- try a newer version of Java 6 or 7. (It may not make a difference, in which case don't bother upgrading production)
- try Centos 6.x
- try not using VMware.
- try reducing the number of threads. You only have 8 cores.
You many find all or none of the above options make a difference, but you won't know until you have a system you can test on with a known/repeatable work load.
0 讨论(0)
发布评论:

提交评论
- 加载中...
被撕碎了的回忆

2021-02-09 06:58
So - can we rule out context switching or too-many-threads as the problem?

I think you concerns over thrashing are warranted. A thread pool with 3000 threads (700+ concurrent operations) on a 2 CPU VMware instance certainly seems like a problem that may be causing context switching overload and performance problems. Limiting the number of threads could give you a performance boost although determining the right number is going to be difficult and probably will use a lot of trial and error.

we need some proof of an issue.

I'm not sure the best way to answer but here are some ideas:
- Watch the load average of the VM OS and the JVM. If you are seeing high load values (20+) then this is an indicator that there are too many things in the run queues.
- Is there no way to simulate the load in a test environment so you can play with the thread pool numbers? If you run simulated load in a test environment with pool size of X and then run with X/2, you should be able to determine optimal values.
- Can you compare high load times of day with lower load times of day? Can you graph number of responses to latency during these times to see if you can see a tipping point in terms of thrashing?
- If you can simulate load then make sure you aren't just testing under the "drink from the fire hose" methodology. You need simulated load that you can dial up and down. Start at 10% and slowing increase simulated load while watching throughput and latency. You should be able to see the tipping points by watching for throughput flattening or otherwise deflecting.
0 讨论(0)
发布评论:

提交评论
- 加载中...