What do 'real', 'user' and 'sys' mean in the output of time(1)?

前端 未结 7 617
执笔经年
执笔经年 2020-11-22 00:35
$ time foo
real        0m0.003s
user        0m0.000s
sys         0m0.004s
$

What do \'real\', \'user\' and \'sys\' mean in the output of time?

7条回答
  •  囚心锁ツ
    2020-11-22 00:53

    Minimal runnable POSIX C examples

    To make things more concrete, I want to exemplify a few extreme cases of time with some minimal C test programs.

    All programs can be compiled and run with:

    gcc -ggdb3 -o main.out -pthread -std=c99 -pedantic-errors -Wall -Wextra main.c
    time ./main.out
    

    and have been tested in Ubuntu 18.10, GCC 8.2.0, glibc 2.28, Linux kernel 4.18, ThinkPad P51 laptop, Intel Core i7-7820HQ CPU (4 cores / 8 threads), 2x Samsung M471A2K43BB1-CRC RAM (2x 16GiB).

    sleep

    Non-busy sleep does not count in either user or sys, only real.

    For example, a program that sleeps for a second:

    #define _XOPEN_SOURCE 700
    #include 
    #include 
    
    int main(void) {
        sleep(1);
        return EXIT_SUCCESS;
    }
    

    GitHub upstream.

    outputs something like:

    real    0m1.003s
    user    0m0.001s
    sys     0m0.003s
    

    The same holds for programs blocked on IO becoming available.

    For example, the following program waits for the user to enter a character and press enter:

    #include 
    #include 
    
    int main(void) {
        printf("%c\n", getchar());
        return EXIT_SUCCESS;
    }
    

    GitHub upstream.

    And if you wait for about one second, it outputs just like the sleep example something like:

    real    0m1.003s
    user    0m0.001s
    sys     0m0.003s
    

    For this reason time can help you distinguish between CPU and IO bound programs: What do the terms "CPU bound" and "I/O bound" mean?

    Multiple threads

    The following example does niters iterations of useless purely CPU-bound work on nthreads threads:

    #define _XOPEN_SOURCE 700
    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    
    uint64_t niters;
    
    void* my_thread(void *arg) {
        uint64_t *argument, i, result;
        argument = (uint64_t *)arg;
        result = *argument;
        for (i = 0; i < niters; ++i) {
            result = (result * result) - (3 * result) + 1;
        }
        *argument = result;
        return NULL;
    }
    
    int main(int argc, char **argv) {
        size_t nthreads;
        pthread_t *threads;
        uint64_t rc, i, *thread_args;
    
        /* CLI args. */
        if (argc > 1) {
            niters = strtoll(argv[1], NULL, 0);
        } else {
            niters = 1000000000;
        }
        if (argc > 2) {
            nthreads = strtoll(argv[2], NULL, 0);
        } else {
            nthreads = 1;
        }
        threads = malloc(nthreads * sizeof(*threads));
        thread_args = malloc(nthreads * sizeof(*thread_args));
    
        /* Create all threads */
        for (i = 0; i < nthreads; ++i) {
            thread_args[i] = i;
            rc = pthread_create(
                &threads[i],
                NULL,
                my_thread,
                (void*)&thread_args[i]
            );
            assert(rc == 0);
        }
    
        /* Wait for all threads to complete */
        for (i = 0; i < nthreads; ++i) {
            rc = pthread_join(threads[i], NULL);
            assert(rc == 0);
            printf("%" PRIu64 " %" PRIu64 "\n", i, thread_args[i]);
        }
    
        free(threads);
        free(thread_args);
        return EXIT_SUCCESS;
    }
    

    GitHub upstream + plot code.

    Then we plot wall, user and sys as a function of the number of threads for a fixed 10^10 iterations on my 8 hyperthread CPU:

    Plot data.

    From the graph, we see that:

    • for a CPU intensive single core application, wall and user are about the same

    • for 2 cores, user is about 2x wall, which means that the user time is counted across all threads.

      user basically doubled, and while wall stayed the same.

    • this continues up to 8 threads, which matches my number of hyperthreads in my computer.

      After 8, wall starts to increase as well, because we don't have any extra CPUs to put more work in a given amount of time!

      The ratio plateaus at this point.

    Note that this graph is only so clear and simple because the work is purely CPU-bound: if it were memory bound, then we would get a fall in performance much earlier with less cores because the memory accesses would be a bottleneck as shown at What do the terms "CPU bound" and "I/O bound" mean?

    Quickly checking that wall < user is a simple way to determine that a program is multithreaded, and the closer that ratio is to the number of cores, the more effective the parallelization is, e.g.:

    • multithreaded linkers: Can gcc use multiple cores when linking?
    • C++ parallel sort: Are C++17 Parallel Algorithms implemented already?

    Sys heavy work with sendfile

    The heaviest sys workload I could come up with was to use the sendfile, which does a file copy operation on kernel space: Copy a file in a sane, safe and efficient way

    So I imagined that this in-kernel memcpy will be a CPU intensive operation.

    First I initialize a large 10GiB random file with:

    dd if=/dev/urandom of=sendfile.in.tmp bs=1K count=10M
    

    Then run the code:

    #define _GNU_SOURCE
    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    
    int main(int argc, char **argv) {
        char *source_path, *dest_path;
        int source, dest;
        struct stat stat_source;
        if (argc > 1) {
            source_path = argv[1];
        } else {
            source_path = "sendfile.in.tmp";
        }
        if (argc > 2) {
            dest_path = argv[2];
        } else {
            dest_path = "sendfile.out.tmp";
        }
        source = open(source_path, O_RDONLY);
        assert(source != -1);
        dest = open(dest_path, O_WRONLY | O_CREAT | O_TRUNC, S_IRUSR | S_IWUSR);
        assert(dest != -1);
        assert(fstat(source, &stat_source) != -1);
        assert(sendfile(dest, source, 0, stat_source.st_size) != -1);
        assert(close(source) != -1);
        assert(close(dest) != -1);
        return EXIT_SUCCESS;
    }
    

    GitHub upstream.

    which gives basically mostly system time as expected:

    real    0m2.175s
    user    0m0.001s
    sys     0m1.476s
    

    I was also curious to see if time would distinguish between syscalls of different processes, so I tried:

    time ./sendfile.out sendfile.in1.tmp sendfile.out1.tmp &
    time ./sendfile.out sendfile.in2.tmp sendfile.out2.tmp &
    

    And the result was:

    real    0m3.651s
    user    0m0.000s
    sys     0m1.516s
    
    real    0m4.948s
    user    0m0.000s
    sys     0m1.562s
    

    The sys time is about the same for both as for a single process, but the wall time is larger because the processes are competing for disk read access likely.

    So it seems that it does in fact account for which process started a given kernel work.

    Bash source code

    When you do just time on Ubuntu, it use the Bash keyword as can be seen from:

    type time
    

    which outputs:

    time is a shell keyword
    

    So we grep source in the Bash 4.19 source code for the output string:

    git grep '"user\b'
    

    which leads us to execute_cmd.c function time_command, which uses:

    • gettimeofday() and getrusage() if both are available
    • times() otherwise

    all of which are Linux system calls and POSIX functions.

    GNU Coreutils source code

    If we call it as:

    /usr/bin/time
    

    then it uses the GNU Coreutils implementation.

    This one is a bit more complex, but the relevant source seems to be at resuse.c and it does:

    • a non-POSIX BSD wait3 call if that is available
    • times and gettimeofday otherwise

提交回复
热议问题