Diagnosing runaway CPU in a .Net production application

前端 未结 10 681
野性不改
野性不改 2021-02-01 10:12

Does anyone know of a tool that can help me figure out why we are seeing runaway CPU in a managed app?

What I am not looking for:

10条回答
  •  余生分开走
    2021-02-01 10:35

    The basic solution

    1. Grab managed stack traces of each managed thread.
    2. Grab basic thread statistics for each managed thread (user mode and kernel time)
    3. Wait a bit
    4. Repeat (1-3)
    5. Analyze the results and find the threads consuming the largest amount of cpu usage, present the stack traces of those threads to the user.

    Managed Vs. Unmanged Stack Traces

    There is a big difference between managed and unmanged stack traces. Managed stack traces contain information about actual .Net calls whereas unmanaged ones contain a list of unmanaged function pointers. Since .Net is jitted the addressed of the unmanaged function pointers are of little use when diagnosing a problem with managed applications.

    managed stack not that useful

    How do you get an unmanaged stack trace for an arbitrary .Net process?

    There are two ways you could get managed stack traces for an managed application.

    • Use CLR profiling (aka. ICorProfiler API)
    • Use CLR Debugging (aka. ICorDebug API)

    What is better in production?

    The CLR Debugging APIs have a very important advantage over the profiling ones, they allow you to attach to a running process. This can be critical when diagnosing performance issues in production. Quite often runaway CPU pops up after days of application use due to some unexpected branch of code executing. At that point of time restarting the app (in order to profile it) is not an option.

    cpu-analyzer.exe

    So, I wrote a little tool that has no-installer and performs the basic solution above using ICorDebug. Its based off the mdbg source which is all merged into a single exe.

    It takes a configurable (default is 10) number of stack traces for all managed threads, at a configurable interval (default is 1000ms).

    Here is a sample output:

    C:\>cpu-analyzer.exe evilapp
    ------------------------------------
    4948
    Kernel Time: 0 User Time: 89856576
    EvilApp.Program.MisterEvil
    EvilApp.Program.b__0
    System.Threading.ExecutionContext.Run
    System.Threading._ThreadPoolWaitCallback.PerformWaitCallbackInternal
    System.Threading._ThreadPoolWaitCallback.PerformWaitCallback
    
    ... more data omitted ...
    

    Feel free to give the tool a shot. It can be downloaded from my blog.

    EDIT

    Here is a thread showing how I use cpu-analyzer to diagnose such an issue in a production app.

提交回复
热议问题