For the last week or so I\'ve been investigating a problem in an application where the memory usage accumulates over time. I narrowed it down to a line that copies a
malloc_trim(0) states that it can only free memory from the top of the main arena heap, so what is going on here?
It can be called "outdated" or "incorrect" documentation. Glibc have no documentation of malloc_trim function; and Linux uses man pages from man-pages project. The man page of malloc_trim
http://man7.org/linux/man-pages/man3/malloc_trim.3.html was written in 2012 by maintainer of man-pages as new. Probably he used some comments from glibc malloc/malloc.c source code http://code.metager.de/source/xref/gnu/glibc/malloc/malloc.c#675
676 malloc_trim(size_t pad);
677
678 If possible, gives memory back to the system (via negative
679 arguments to sbrk) if there is unused memory at the `high' end of
680 the malloc pool. You can call this after freeing large blocks of
681 memory to potentially reduce the system-level memory requirements
682 of a program. However, it cannot guarantee to reduce memory. Under
683 some allocation patterns, some large free blocks of memory will be
684 locked between two used chunks, so they cannot be given back to
685 the system.
686
687 The `pad' argument to malloc_trim represents the amount of free
688 trailing space to leave untrimmed. If this argument is zero,
689 only the minimum amount of memory to maintain internal data
690 structures will be left (one page or less). Non-zero arguments
691 can be supplied to maintain enough trailing space to service
692 future expected allocations without having to re-obtain memory
693 from the system.
694
695 Malloc_trim returns 1 if it actually released any memory, else 0.
696 On systems that do not support "negative sbrks", it will always
697 return 0.
Actual implementation in glibc is __malloc_trim
and it has code for iterating over arenas:
http://code.metager.de/source/xref/gnu/glibc/malloc/malloc.c#4552
4552 int
4553 __malloc_trim (size_t s)
4560 mstate ar_ptr = &main_arena;
4561 do
4562 {
4563 (void) mutex_lock (&ar_ptr->mutex);
4564 result |= mtrim (ar_ptr, s);
4565 (void) mutex_unlock (&ar_ptr->mutex);
4566
4567 ar_ptr = ar_ptr->next;
4568 }
4569 while (ar_ptr != &main_arena);
Every arena is trimmed using mtrim()
(mTRIm()
) function, which calls malloc_consolidate()
to convert all free segments from fastbins (they are not coalesced at free as they are fast) to normal free chunks (which are coalesced with adjacent chunks)
4498 /* Ensure initialization/consolidation */
4499 malloc_consolidate (av);
4111 malloc_consolidate is a specialized version of free() that tears
4112 down chunks held in fastbins.
1581 Fastbins
1591 Chunks in fastbins keep their inuse bit set, so they cannot
1592 be consolidated with other free chunks. malloc_consolidate
1593 releases all chunks in fastbins and consolidates them with
1594 other free chunks.
The problem is, when the worker thread is recreated, it creates a new arena/heap instead of reusing the previous one, such that the fastbins from previous arenas/heaps are never reused.
This is strange. By design, maximum number of arenas is limited in glibc malloc by cpu_core_count * 8 (for 64-bit platform); cpu_core_count * 2 (for 32-bit platform) or by environment variable MALLOC_ARENA_MAX
/ mallopt
parameter M_ARENA_MAX
.
You can limit count of arenas for your application; call malloc_trim()
periodically or call to malloc()
with "large" size (it will call malloc_consolidate
) and then free()
for it from your threads just before destroying:
3319 _int_malloc (mstate av, size_t bytes)
3368 if ((unsigned long) (nb) <= (unsigned long) (get_max_fast ()))
// fastbin allocation path
3405 if (in_smallbin_range (nb))
// smallbin path; malloc_consolidate may be called
3437 If this is a large request, consolidate fastbins before continuing.
3438 While it might look excessive to kill all fastbins before
3439 even seeing if there is space available, this avoids
3440 fragmentation problems normally associated with fastbins.
3441 Also, in practice, programs tend to have runs of either small or
3442 large requests, but less often mixtures, so consolidation is not
3443 invoked all that often in most programs. And the programs that
3444 it is called frequently in otherwise tend to fragment.
3445 */
3446
3447 else
3448 {
3449 idx = largebin_index (nb);
3450 if (have_fastchunks (av))
3451 malloc_consolidate (av);
3452 }
PS: there is comment in man page of malloc_trim
https://github.com/mkerrisk/man-pages/commit/a15b0e60b297e29c825b7417582a33e6ca26bf65:
+.SH NOTES
+This function only releases memory in the main arena.
+.\" malloc/malloc.c::mTRIm():
+.\" return result | (av == &main_arena ? sYSTRIm (pad, av) : 0);
Yes, there is check for main_arena, but it is at very end of malloc_trim
implementation mTRIm()
and it is just for calling sbrk()
with negative offset. Since 2007 (glibc 2.9 and newer) there is another method to return memory back to the OS: madvise(MADV_DONTNEED)
which is used in all arenas (and is not documented by author of glibc patch or author of man page). Consolidate is called for every arena. There is also code for trimming (munmapping) top chunk of mmap-ed heaps (heap_trim
/shrink_heap
called from slow path free()), but it is not called from malloc_trim
.