I have been debugging a crash for days now, that occurs in the depths of OpenSSL (discussion with the maintainers here). I took some time investigating so I\'ll try to make this
I could finally find the problem and solve it.
Turned out some instruction was writing bytes past the allocated heap buffer (hence the 0x00000000
instead of the expected 0xfdfdfdfd
).
In debug mode this overwrite of the memory guards remains undetected until the memory is freed with free()
or reallocated with realloc()
. This is what caused the HEAP CORRUPTION message I faced.
I expect that in release mode, this could have had dramatic effects, like overwritting a valid memory block used somewhere else in the application.
For future reference to people facing similar issues, here is how I did:
OpenSSL provides a CRYPTO_set_mem_ex_functions()
function, defined like so:
int CRYPTO_set_mem_ex_functions(void *(*m) (size_t, const char *, int),
void *(*r) (void *, size_t, const char *,
int), void (*f) (void *))
This function allows you to hook in and replace memory allocation/freeing functions within OpenSSL. The nice thing is the addition of the const char *, int
parameters which are basically filled for you by OpenSSL and contain the filename and line number of the allocation.
Armed with this information, it was easy to find out the place where the memory block was allocated. I could then step through the code while looking at the memory inspector waiting for the memory block to be corrupted.
In my case what happenned was:
if (!combine) {
*pval = OPENSSL_malloc(it->size); // <== The allocation is here.
if (!*pval) goto memerr;
memset(*pval, 0, it->size);
asn1_do_lock(pval, 0, it);
asn1_enc_init(pval, it);
}
for (i = 0, tt = it->templates; i < it->tcount; tt++, i++) {
pseqval = asn1_get_field_ptr(pval, tt);
if (!ASN1_template_new(pseqval, tt))
goto memerr;
}
ASN1_template_new()
is called on the 3 sequence elements to initialize them.
Turns out ASN1_template_new()
calls in turn asn1_item_ex_combine_new()
which does this:
if (!combine)
*pval = NULL;
pval
being a ASN1_VALUE**
, this instruction sets 8 bytes on Windows x64 systems instead of the intended 4 bytes, leading to memory corruption for the last element of the list.
For the full discussion on how this problem was solved upstream, see this thread.
In general the possibilities include:
malloc()
and friends put extra bookkeeping information in here, such as the size, and probably a sanity-check, which you will fail by overwriting.malloc()
-ed.free()
-d.