We have a complex program that is working well on heavy duty input (any input actually) with no multithreading implemented.
We've implemented multithreading with a threadpool, and given these input parameters I get these results:
(Note: Where I say no errors, it means I've tested with valgrind -v
and when I say no memory leaks, it means I've tested it with valgrind --leak-check=full -v
).
- small_file: Runs successfully with more than 1 workers (threads), no valgrind errors, no memory leaks
- medium_file: With 1 worker it runs successfully, no errors/memory leaks. With > 1 workers, I get: a. usually heap-corruption error, b. double-free. When running with
valgrind -v
with > 1 workers the program completes successfully. Also, no errors are printed from valgrind, that isERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
.
Now that I don't get any errors from valgrind to start with, what can I do to find the memory corruption problem in this complex and big application?
DevelopmentEnvironment:
Ubuntu, 64bit, gcc version: 4.7.2 and 4.8.1 (different computers, newer version of Ubuntu).
With > 1 workers, I get: a. usually heap-corruption error, b.double-free. When running with valgrind -v with > 1 workers the program completes successfully
Based on the above symptoms, it looks to me that there is clearly some sort of synchronization problem is happening in your program. It looks like your program is sharing the heap memory address between the threads and hence whenever there is some data race you are facing problem.
You have also mentioned that when you are running valgrind -v, then your program is completing successfully. This indicates that your program has synchronization problem and that too is dependant on the sequence/timing. These are one of the most difficult bug to find out.We should also remember that dynamic tools would not give any warning until program goes and execute something wrong. I mean there could be problem in the program, but sequence of execution(as there is some timing related problem) determined whether tools would capture those failure or not.
Having said that, I think there is not sort cut way to find such bugs in big programs.However I strongly suspect that there is some data racing scenario which is leading to memory corruption/double free. So you may want to use Helgrind to check/find data racing/threading problem which might be leading to memory corruption.
Now that I don't get any errors from valgrind to start with, what can I do to find the memory corruption problem in this complex and big application?
Well let me describe to you what I did to find memory leaks in Microsoft's implementation of JavaScript back in the 1990s.
First I ensured that in the debug version of my program, as many memory allocations as possible were being routed to the same helper methods. That is, I redefined malloc
, new
, etc, to all be synonyms for an allocator that I wrote myself.
That allocator was just a thin shell around an operating system virtual heap memory allocator, but it had some extra smarts. It allocated extra memory at the beginning and end of the block and filled that with sentinel values, a threadsafe count of the number of allocations so far, and a threadsafe doubly-linked list of all allocations. The "free" routine would verify that the sentinel values on both sides were still intact; if not, then there's a memory corruption somewhere. It would unlink the block from the linked list and free it.
At any point I could ask the memory manager for a list of all the outstanding blocks in memory in the order they had been allocated. Any items left in the list when the DLL was unloaded were memory leaks.
Those tools enabled me to find memory leaks and memory corruptions in real time very easily.
Please use CORE DUMP::[mostly it used in double-free,glibc detected type errors]
Compile your program with gcc -g option for debug information
ulimit -a
it will show you size of core file
ulimit -c unlimited
it will set size of core file unlimited
now run you program, then in your current directory a file will generate named "core"
then analyze it by GDB as below..
gdb ./youprogram core
gdb)bt
it will show you where is problem..
if you find any difficulty then write me...
来源:https://stackoverflow.com/questions/22617714/having-hard-time-tracking-memory-corruption-when-running-with-valgrind-runs-co