CPU underutilized. Due to blocking I/O?

问题

I am trying to find where lies the bottleneck of a C# server application which underutilize CPU. I think this may be due to poor disk I/O performance and has nothing to do with the application itself but I am having trouble making a fact out of this supposition.

The application reads messages from a local MSMQ queue, does some processing on each messages and after processing the messages, sends out response messages to another local MSMQ queue.

I am using an async loop to read messages from queue, dequeuing them as fast as possible and dispatching them for processing using Task.Run to launch the processing of each messages (and do not await on this Task.Run .. just attaching a continuation only faulted on it to log error). Each messages is processed concurrently, i.e there is no need to wait for a message to be fully processed before processing the next one.

At the end of the processing of a message, I am using the Send method of MessageQueue (somehow asynchronous but not really because it has to wait on disk write before returning -see System.Messaging - why MessageQueue does not offer an asynchronous version of Send).

For the benchmarks I am queuing 100K messages in the queue (approx 100MB total size for 100K messages) and then I launch the program. On two of my personal computers (SSD HD on one and SATA2 HD on the other with i7 CPU quadcores -8 logical proc-) I reach ~95% CPU usage for the duration of the program lifecyle (dequeuing the 100K messages, processing them and sending responses). Messages are dequeued as fast a possible, processed as fast as possible (CPU involved here) and then response for each message sent to different local queue.

Now on a virtual machine running non HT dual core CPU (have no idea what is the underlying disk but seems far less performant than mines... during benchmark, with Perfmon I can see avg disk sec/write arround 10-15 ms on this VM, whereas it is arround 2ms on my personal machines) when I am running the same bench, I only reach ~55% CPU (when I am running the same bench on the machine without sending response messages to queue I reach ~90% CPU).

I don't really understand what is the problem here. Seems clear that sending message to the queue is the problem and slows down the global processing of the program (and dequeuing of messages to be processed), but why would that be considering that I am using Task.Run to launch processing of each dequeued message and ultimately response sending, I would not expect CPU to be underutilized. Unless when one thread is sending a message it blocks other threads to run on the same core while it waits for the return (disk write) in which case it would maybe make sense considering latency is much higher than on my personal computers, but a thread waiting for I/O should not prevent other threads from running.

I am really trying to understand why I am not reaching at least 95% cpu usage on this machine. I am blindly saying this is due to poorer disk i/o performance, but still I don't see why it would lead to CPU underutilization considering I am running processing concurrently using Task.Run. It could also be some system problem completely unrelated to disk, but considering that MessageQueue.Send seems to be the problem and that this method ultimately writes messages to a memory mapped file + disk, I don't see where the performance issue could come from other than disk.

It is of course for sure a system performance issue as the program maximize CPU usage on my own computers, but I need to find what the bottleneck is exactly on the VM system, and why exactly it is affecting the concurrency / speed of my application.

Any idea ?

回答1:

To examine poor disc and or cpu utilization there is only one tool: Windows Performance Toolkit. For an example how to use it see here. You should get the latest one from the Windows 8.1 SDK (requires .NET 4.5.1) which gives you most capabilities but the one from the Windows 8 SDK is also fine.

There you get graphs % CPU Utilization and % Disc Utilization. If either one is at 100% and the other one is low then you have found the bottleneck. Since it is a system wide profiler you can check if the msmq service is using the disc badly or you or someone else (e.g. virus scanner is a common issue).

You can directly get to your call stacks and check which process and thread did wake your worker thread up which is supposed to run at full speed. Then you can jump to the readying thread and process and check what it did do before it could ready your thread. That way you can directly verify what was hindering it so long.

No more guessing. You can really see what the system is doing.

To analyze further enable in the CPU Usage Precise view the following columns:

NewProcess
NewThreadId
NewThreadStack(Frame Tags)
ReadyingProcess
ReadyingThreadId
Ready(us) Sum
Wait(us) Sum
Wait(us)
%CPU Usage

Then drill down for a call stack in your process to see where high Wait(us) times do occur in a thread that is supposed to run at full speed.. You can drill down to one single event until you can go no further. Then you will see values in Readying Process and ReadyingThreadId. Go to that process/thread (it can be your own) and repeat the process until you end up in some blocking operation which does either involve disc IO or sleeps or a long running device driver call (e.g virus scanner or the vm driver).

回答2:

If the Disk I/O performance counters don't look abnormally high, I'd look next at the hypervisor level. Assuming you're running the exact same code, using a VM adds latency to the entire stack (CPU, RAM, Disk). You can perhaps tweak CPU Scheduling at the hypervisor level and see if this will increase CPU utilization.

I'd also consider using a RAMDisk temporarily for performance testing. This would eliminate the Disk/SAN latency and you can see if that fixes your problem.

来源：https://stackoverflow.com/questions/19748129/cpu-underutilized-due-to-blocking-i-o

标签

performance

async-await

msmq