What do 'real', 'user' and 'sys' mean in the output of time(1)?

前端未结

关注

 7  617

执笔经年 2020-11-22 00:35

$ time foo
real        0m0.003s
user        0m0.000s
sys         0m0.004s
$

What do \'real\', \'user\' and \'sys\' mean in the output of time?

7条回答

囚心锁ツ (楼主)

2020-11-22 00:53
Minimal runnable POSIX C examples

To make things more concrete, I want to exemplify a few extreme cases of time with some minimal C test programs.

All programs can be compiled and run with:
```
gcc -ggdb3 -o main.out -pthread -std=c99 -pedantic-errors -Wall -Wextra main.c
time ./main.out
```
and have been tested in Ubuntu 18.10, GCC 8.2.0, glibc 2.28, Linux kernel 4.18, ThinkPad P51 laptop, Intel Core i7-7820HQ CPU (4 cores / 8 threads), 2x Samsung M471A2K43BB1-CRC RAM (2x 16GiB).

sleep

Non-busy sleep does not count in either user or sys, only real.

For example, a program that sleeps for a second:
```
#define _XOPEN_SOURCE 700
#include 
#include 

int main(void) {
    sleep(1);
    return EXIT_SUCCESS;
}
```
GitHub upstream.

outputs something like:
```
real    0m1.003s
user    0m0.001s
sys     0m0.003s
```
The same holds for programs blocked on IO becoming available.

For example, the following program waits for the user to enter a character and press enter:
```
#include 
#include 

int main(void) {
    printf("%c\n", getchar());
    return EXIT_SUCCESS;
}
```
GitHub upstream.

And if you wait for about one second, it outputs just like the sleep example something like:
```
real    0m1.003s
user    0m0.001s
sys     0m0.003s
```
For this reason time can help you distinguish between CPU and IO bound programs: What do the terms "CPU bound" and "I/O bound" mean?

Multiple threads

The following example does niters iterations of useless purely CPU-bound work on nthreads threads:
```
#define _XOPEN_SOURCE 700
#include 
#include 
#include 
#include 
#include 
#include 
#include 

uint64_t niters;

void* my_thread(void *arg) {
    uint64_t *argument, i, result;
    argument = (uint64_t *)arg;
    result = *argument;
    for (i = 0; i < niters; ++i) {
        result = (result * result) - (3 * result) + 1;
    }
    *argument = result;
    return NULL;
}

int main(int argc, char **argv) {
    size_t nthreads;
    pthread_t *threads;
    uint64_t rc, i, *thread_args;

    /* CLI args. */
    if (argc > 1) {
        niters = strtoll(argv[1], NULL, 0);
    } else {
        niters = 1000000000;
    }
    if (argc > 2) {
        nthreads = strtoll(argv[2], NULL, 0);
    } else {
        nthreads = 1;
    }
    threads = malloc(nthreads * sizeof(*threads));
    thread_args = malloc(nthreads * sizeof(*thread_args));

    /* Create all threads */
    for (i = 0; i < nthreads; ++i) {
        thread_args[i] = i;
        rc = pthread_create(
            &threads[i],
            NULL,
            my_thread,
            (void*)&thread_args[i]
        );
        assert(rc == 0);
    }

    /* Wait for all threads to complete */
    for (i = 0; i < nthreads; ++i) {
        rc = pthread_join(threads[i], NULL);
        assert(rc == 0);
        printf("%" PRIu64 " %" PRIu64 "\n", i, thread_args[i]);
    }

    free(threads);
    free(thread_args);
    return EXIT_SUCCESS;
}
```
GitHub upstream + plot code.

Then we plot wall, user and sys as a function of the number of threads for a fixed 10^10 iterations on my 8 hyperthread CPU:

Plot data.

From the graph, we see that:
- for a CPU intensive single core application, wall and user are about the same
- for 2 cores, user is about 2x wall, which means that the user time is counted across all threads.
  
  user basically doubled, and while wall stayed the same.
- this continues up to 8 threads, which matches my number of hyperthreads in my computer.
  
  After 8, wall starts to increase as well, because we don't have any extra CPUs to put more work in a given amount of time!
  
  The ratio plateaus at this point.
Note that this graph is only so clear and simple because the work is purely CPU-bound: if it were memory bound, then we would get a fall in performance much earlier with less cores because the memory accesses would be a bottleneck as shown at What do the terms "CPU bound" and "I/O bound" mean?

Quickly checking that wall < user is a simple way to determine that a program is multithreaded, and the closer that ratio is to the number of cores, the more effective the parallelization is, e.g.:
- multithreaded linkers: Can gcc use multiple cores when linking?
- C++ parallel sort: Are C++17 Parallel Algorithms implemented already?
Sys heavy work with sendfile

The heaviest sys workload I could come up with was to use the sendfile, which does a file copy operation on kernel space: Copy a file in a sane, safe and efficient way

So I imagined that this in-kernel memcpy will be a CPU intensive operation.

First I initialize a large 10GiB random file with:
```
dd if=/dev/urandom of=sendfile.in.tmp bs=1K count=10M
```
Then run the code:
```
#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 
#include 
#include 

int main(int argc, char **argv) {
    char *source_path, *dest_path;
    int source, dest;
    struct stat stat_source;
    if (argc > 1) {
        source_path = argv[1];
    } else {
        source_path = "sendfile.in.tmp";
    }
    if (argc > 2) {
        dest_path = argv[2];
    } else {
        dest_path = "sendfile.out.tmp";
    }
    source = open(source_path, O_RDONLY);
    assert(source != -1);
    dest = open(dest_path, O_WRONLY | O_CREAT | O_TRUNC, S_IRUSR | S_IWUSR);
    assert(dest != -1);
    assert(fstat(source, &stat_source) != -1);
    assert(sendfile(dest, source, 0, stat_source.st_size) != -1);
    assert(close(source) != -1);
    assert(close(dest) != -1);
    return EXIT_SUCCESS;
}
```
GitHub upstream.

which gives basically mostly system time as expected:
```
real    0m2.175s
user    0m0.001s
sys     0m1.476s
```
I was also curious to see if time would distinguish between syscalls of different processes, so I tried:
```
time ./sendfile.out sendfile.in1.tmp sendfile.out1.tmp &
time ./sendfile.out sendfile.in2.tmp sendfile.out2.tmp &
```
And the result was:
```
real    0m3.651s
user    0m0.000s
sys     0m1.516s

real    0m4.948s
user    0m0.000s
sys     0m1.562s
```
The sys time is about the same for both as for a single process, but the wall time is larger because the processes are competing for disk read access likely.

So it seems that it does in fact account for which process started a given kernel work.

Bash source code

When you do just time on Ubuntu, it use the Bash keyword as can be seen from:
```
type time
```
which outputs:
```
time is a shell keyword
```
So we grep source in the Bash 4.19 source code for the output string:
```
git grep '"user\b'
```
which leads us to execute_cmd.c function time_command, which uses:
- gettimeofday() and getrusage() if both are available
- times() otherwise
all of which are Linux system calls and POSIX functions.

GNU Coreutils source code

If we call it as:
```
/usr/bin/time
```
then it uses the GNU Coreutils implementation.

This one is a bit more complex, but the relevant source seems to be at resuse.c and it does:
- a non-POSIX BSD wait3 call if that is available
- times and gettimeofday otherwise
0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...