I have an odd bug in my program, it appears to me that malloc() is causing a SIGSEGV, which as far as my understanding goes does not make any sense. I am using a library called simclist for dynamic lists.
Here is a struct that is referenced later:
typedef struct {
int msgid;
int status;
void* udata;
list_t queue;
} msg_t;
And here is the code:
msg_t* msg = (msg_t*) malloc( sizeof( msg_t ) );
msg->msgid = msgid;
msg->status = MSG_STAT_NEW;
msg->udata = udata;
list_init( &msg->queue );
list_init
is where the program fails, here is the code for list_init:
/* list initialization */
int list_init(list_t *restrict l) {
if (l == NULL) return -1;
srandom((unsigned long)time(NULL));
l->numels = 0;
/* head/tail sentinels and mid pointer */
l->head_sentinel = (struct list_entry_s *)malloc(sizeof(struct list_entry_s));
l->tail_sentinel = (struct list_entry_s *)malloc(sizeof(struct list_entry_s));
l->head_sentinel->next = l->tail_sentinel;
l->tail_sentinel->prev = l->head_sentinel;
l->head_sentinel->prev = l->tail_sentinel->next = l->mid = NULL;
l->head_sentinel->data = l->tail_sentinel->data = NULL;
/* iteration attributes */
l->iter_active = 0;
l->iter_pos = 0;
l->iter_curentry = NULL;
/* free-list attributes */
l->spareels = (struct list_entry_s **)malloc(SIMCLIST_MAX_SPARE_ELEMS * sizeof(struct list_entry_s *));
l->spareelsnum = 0;
#ifdef SIMCLIST_WITH_THREADS
l->threadcount = 0;
#endif
list_attributes_setdefaults(l);
assert(list_repOk(l));
assert(list_attrOk(l));
return 0;
}
the line l->spareels = (struct list_entry_s **)malloc(SIMCLIST_MAX_SPARE_ELEMS *
is where the SIGSEGV is caused according to the stack trace. I am using gdb/nemiver for debugging but am at a loss. The first time this function is called it works fine but it always fails the second time. How can malloc() cause a SIGSEGV?
This is the stack trace:
#0 ?? () at :0
#1 malloc () at :0
#2 list_init (l=0x104f290) at src/simclist.c:205
#3 msg_new (msg_switch=0x1050dc0, msgid=8, udata=0x0) at src/msg_switch.c:218
#4 exread (sockfd=8, conn_info=0x104e0e0) at src/zimr-proxy/main.c:504
#5 zfd_select (tv_sec=0) at src/zfildes.c:124
#6 main (argc=3, argv=0x7fffcabe44f8) at src/zimr-proxy/main.c:210
Any help or insight is very appreciated!
malloc
can segfault for example when the heap is corrupted. Check that you are not writing anything beyond the bounds of any previous allocation.
Probably memory violation occurs in other part of your code. If you are on Linux, you should definitely try valgrind. I would never trust my own C programs unless it passes valgrind.
EDIT: another useful tool is Electric fence. Glibc also provides the MALLOC_CHECK_ environmental variable to help debug memory problems. These two methods do not affect running speed as much as valgrind.
You probably have corrupted you heap somewhere before this call by a buffer overflow or by calling free
with a pointer that wasn't allocated by malloc
(or that was already freed).
If the internal data structures used by malloc get corrupted this way, malloc is using invalid data and might crash.
There are a myriad ways of triggering a core dump from malloc()
(and realloc()
and calloc()
). These include:
- Buffer overflow: writing beyond the end of the allocated space (trampling control information that
malloc()
was keeping there). - Buffer underflow: writing before the start of the allocated space (trampling control information that
malloc()
was keeping there). - Freeing memory that was not allocated by
malloc()
. In a mixed C and C++ program, that would include freeing memory allocated in C++ bynew
. - Freeing a pointer that points part way through a memory block allocated by
malloc()
- which is a special case of the previous case. - Freeing a pointer that was already freed - the notorious 'double free'.
Using a diagnostic version of malloc()
or enabling diagnostics in your system's standard version, may help identify some of these problems. For example, it may be able to detect small underflows and overflows (because it allocates extra space to provide a buffer zone around the space that you requested), and it can probably detect attempts to free memory that was not allocated or that was already freed or pointers part way through the allocated space - because it will store the information separately from the allocated space. The cost is that the debugging version takes more space. A really good allocator will be able to record the stack trace and line numbers to tell you where the allocation occurred in your code, or where the first free occurred.
You should try to debug this code in isolation, to see if the problem is actually located where the segfault is generated. (I suspect that it is not).
This means:
#1: Compile the code with -O0, to make sure that gdb gets correct line numbering information.
#2: Write a unit test which calls this part of the code.
My guess is that the code will work correctly when used separately. You can then test your other modules in the same way, until you find out what causes the bug.
Using Valgrind, as others have suggested, is also a very good idea.
The code is problematic. If malloc returns NULL, this case is not handled correctly in your code. You simply assume that memory has been allocated for you when it actually has not been. This can cause memory corruption.
来源:https://stackoverflow.com/questions/1441017/how-can-malloc-cause-a-sigsegv