I am soon going to be tasked with doing a proper memory profile of a code that is written in C/C++ and uses CUDA to take advantage of GPU processing.
My initial thou
You could try Google's PerfTools' Heap-Profiler:
http://google-perftools.googlecode.com/svn/trunk/doc/heapprofile.html
It's very lightweight; it literally replaces malloc/calloc/realloc/free to add instrumentation code. It's primarily tested on Linux platforms.
If you have compiled with debugging symbols, and your third-party libraries come with debug-version variants, PerfTools should do very well. If you don't have debug-symbol libraries, build your code with debug symbols anyway. It would give you detailed numbers for your code, and all the leftover can be attributes to the third-party library.
Maybe valgrind and the Massif tool?
You could use the profiler included in Visual Studio 2010 Premium and Ultimate.
It lets you choose between different methods of performance measuring, the most useful for you will probably be CPU sampling because it freezes your program at arbitrary time intervals and figures out which functions it is currently executing, thereby not making your program run substantially slower.
If you don't want to use an "external" tool, you can try to use tools like:
mtrace
It installs handlers for malloc, realloc and free and log every operation to a file. See the Wikipedia I lined for code usage examples.
dmalloc
It's a library you can use in your code, and can find memory leaks, off-by-one errors and usage of invalid addresses. You can also disable it at compile time with -DDMALLOC_DISABLE.
Anyway, I would rather not get this approach. Instead, I suggest you to try and stress test your application while running it on a test server under valgrind (or any equivalent tool) and ensure you're doing memory allocation right, and then let the application run without any memory allocation checking in production to maximize the speed. But, in fact, it depends on what your application do and what your needs are.
Maybe linker option --wrap=symbol can help you. Really good example can be found here: man ld
To track real time memory consumption of my programs on Linux I simply read the /proc/[pid]/stat
. It's a fairly light operation, could be negligible in your case if the 3rd party library your want to track does consequent work. If you want to have memory information during the 3rd party library work, you can read the stat
file into an independent thread or in an other process. (Memory peak rarely append before or after function calls ! ...)
For the CUDA/GPU thing I think gDEBugger could help you. I am not sure but the memory analyzer do not affect performance much.