Linux OS: /proc/[pid]/smaps vs /proc/[pid]/statm

后端 未结 2 1968
面向向阳花
面向向阳花 2021-02-08 15:59

I would like calculate the memory usage for single process. So after a little bit of research I came across over smaps and statm.

First of all what is smaps and statm? W

相关标签:
2条回答
  • 2021-02-08 16:06

    I think statm is an approximated simplification of smaps, which is more expensive to get. I came to this conclusion after I looked at the source:

    smaps

    The information you see in smaps is defined in /fs/proc/task_mmu.c:

    static int show_smap(struct seq_file *m, void *v, int is_pid)
    {
            (...)
    
            struct mm_walk smaps_walk = {
                    .pmd_entry = smaps_pte_range,
                    .mm = vma->vm_mm,
                    .private = &mss,
            };
    
            memset(&mss, 0, sizeof mss);
            walk_page_vma(vma, &smaps_walk);
            show_map_vma(m, vma, is_pid);
    
            seq_printf(m,
                    (...)
                    "Rss:            %8lu kB\n"
                    (...)
                    mss.resident >> 10,
    

    The information in mss is used by walk_page_vma defined in /mm/pagewalk.c. However, the mss member resident is not filled in walk_page_vma - instead, walk_page_vma calls callback specified in smaps_walk:

    .pmd_entry = smaps_pte_range,
    .private = &mss,
    

    like this:

      if (walk->pmd_entry)
          err = walk->pmd_entry(pmd, addr, next, walk);
    

    So what does our callback, smaps_pte_range in /fs/proc/task_mmu.c, do? It calls smaps_pte_entry and smaps_pmd_entry in some circumstances, out of which both call statm_account(), which in turn... upgrades resident size! All of these functions are defined in the already linked task_mmu.c so I didn't post relevant code snippets as they can be easily seen in the linked sources.

    PTE stands for Page Table Entry and PMD is Page Middle Directory. So basically we iterate through the page entries associated with given process and update RAM usage depending on the circumstances.

    statm

    The information you see in statm is defined in /fs/proc/array.c:

    int proc_pid_statm(struct seq_file *m, struct pid_namespace *ns,
                    struct pid *pid, struct task_struct *task)
    {
            unsigned long size = 0, resident = 0, shared = 0, text = 0, data = 0;
            struct mm_struct *mm = get_task_mm(task);
    
            if (mm) {
                    size = task_statm(mm, &shared, &text, &data, &resident);
                    mmput(mm);
            }
            seq_put_decimal_ull(m, 0, size);
            seq_put_decimal_ull(m, ' ', resident);
            seq_put_decimal_ull(m, ' ', shared);
            seq_put_decimal_ull(m, ' ', text);
            seq_put_decimal_ull(m, ' ', 0);
            seq_put_decimal_ull(m, ' ', data);
            seq_put_decimal_ull(m, ' ', 0);
            seq_putc(m, '\n');
            return 0;
    }
    

    This time, resident is filled by task_statm. This one has two implementations, one in /fs/proc/task_mmu.c and second in /fs/proc/task_nomm.c. Since they're almost surely mutually exclusive, I'll focus on the implementation in task_mmu.c (which also contained task_smaps). In this implementation we see that

    unsigned long task_statm(struct mm_struct *mm,
                        unsigned long *shared, unsigned long *text,
                        unsigned long *data, unsigned long *resident)
    {
            *shared = get_mm_counter(mm, MM_FILEPAGES);
            (...)
            *resident = *shared + get_mm_counter(mm, MM_ANONPAGES);
            return mm->total_vm;
    }
    

    it queries some counters, namely, MM_FILEPAGES and MM_ANONPAGES. These counters are modified during different operations on memory such as do_wp_page defined at /mm/memory.c. All of the modifications seem to be done by the files located in /mm/ and there seem to be quite a lot of them, so I didn't include them here.

    Conclusion

    smaps does complicated iteration through all referenced memory regions and updates resident size using the collected information. statm uses data that was already calculated by someone else.

    The most important part is that while smaps collects the data each time in an independent manner, statm uses counters that get incremented or decremented during process life cycle. There are a lot of places that need to do the bookkeeping, and perhaps some places don't upgrade the counters like they should. That's why IMO statm is inferior to smaps, even if it takes fewer CPU cycles to complete.

    Please note that this is the conclusion I drew based on common sense, but I might be wrong - perhaps there are no internal inconsistencies in counter decrementing and incrementing, and instead, they might count some pages differently than smaps. At this point I believe it'd be wise to take it to some experienced kernel maintainers.

    0 讨论(0)
  • 2021-02-08 16:23

    I would suggest looking through 'top' command line source code. It gets all its info from /proc as well so jt may be a good reference.

    0 讨论(0)
提交回复
热议问题