I would like calculate the memory usage for single process. So after a little bit of research I came across over smaps and statm.
First of all what is smaps and statm? W
I think statm
is an approximated simplification of smaps
, which is more expensive to get. I came to this conclusion after I looked at the source:
smaps
The information you see in smaps
is defined in /fs/proc/task_mmu.c:
static int show_smap(struct seq_file *m, void *v, int is_pid)
{
(...)
struct mm_walk smaps_walk = {
.pmd_entry = smaps_pte_range,
.mm = vma->vm_mm,
.private = &mss,
};
memset(&mss, 0, sizeof mss);
walk_page_vma(vma, &smaps_walk);
show_map_vma(m, vma, is_pid);
seq_printf(m,
(...)
"Rss: %8lu kB\n"
(...)
mss.resident >> 10,
The information in mss
is used by walk_page_vma
defined in /mm/pagewalk.c. However, the mss
member resident
is not filled in walk_page_vma
- instead, walk_page_vma
calls callback specified in smaps_walk
:
.pmd_entry = smaps_pte_range, .private = &mss,
like this:
if (walk->pmd_entry) err = walk->pmd_entry(pmd, addr, next, walk);
So what does our callback, smaps_pte_range
in /fs/proc/task_mmu.c, do?
It calls smaps_pte_entry
and smaps_pmd_entry
in some circumstances, out of which both call statm_account()
, which in turn... upgrades resident
size! All of these functions are defined in the already linked task_mmu.c
so I didn't post relevant code snippets as they can be easily seen in the linked sources.
PTE stands for Page Table Entry and PMD is Page Middle Directory. So basically we iterate through the page entries associated with given process and update RAM usage depending on the circumstances.
statm
The information you see in statm
is defined in /fs/proc/array.c:
int proc_pid_statm(struct seq_file *m, struct pid_namespace *ns,
struct pid *pid, struct task_struct *task)
{
unsigned long size = 0, resident = 0, shared = 0, text = 0, data = 0;
struct mm_struct *mm = get_task_mm(task);
if (mm) {
size = task_statm(mm, &shared, &text, &data, &resident);
mmput(mm);
}
seq_put_decimal_ull(m, 0, size);
seq_put_decimal_ull(m, ' ', resident);
seq_put_decimal_ull(m, ' ', shared);
seq_put_decimal_ull(m, ' ', text);
seq_put_decimal_ull(m, ' ', 0);
seq_put_decimal_ull(m, ' ', data);
seq_put_decimal_ull(m, ' ', 0);
seq_putc(m, '\n');
return 0;
}
This time, resident
is filled by task_statm
. This one has two implementations, one in /fs/proc/task_mmu.c and second in /fs/proc/task_nomm.c. Since they're almost surely mutually exclusive, I'll focus on the implementation in task_mmu.c
(which also contained task_smaps
). In this implementation we see that
unsigned long task_statm(struct mm_struct *mm,
unsigned long *shared, unsigned long *text,
unsigned long *data, unsigned long *resident)
{
*shared = get_mm_counter(mm, MM_FILEPAGES);
(...)
*resident = *shared + get_mm_counter(mm, MM_ANONPAGES);
return mm->total_vm;
}
it queries some counters, namely, MM_FILEPAGES
and MM_ANONPAGES
. These counters are modified during different operations on memory such as do_wp_page
defined at /mm/memory.c. All of the modifications seem to be done by the files located in /mm/
and there seem to be quite a lot of them, so I didn't include them here.
smaps
does complicated iteration through all referenced memory regions and updates resident
size using the collected information. statm
uses data that was already calculated by someone else.
The most important part is that while smaps
collects the data each time in an independent manner, statm
uses counters that get incremented or decremented during process life cycle. There are a lot of places that need to do the bookkeeping, and perhaps some places don't upgrade the counters like they should. That's why IMO statm
is inferior to smaps
, even if it takes fewer CPU cycles to complete.
Please note that this is the conclusion I drew based on common sense, but I might be wrong - perhaps there are no internal inconsistencies in counter decrementing and incrementing, and instead, they might count some pages differently than smaps
. At this point I believe it'd be wise to take it to some experienced kernel maintainers.
I would suggest looking through 'top' command line source code. It gets all its info from /proc as well so jt may be a good reference.