What killed my process and why?

后端 未结 14 950
误落风尘
误落风尘 2020-11-22 09:34

My application runs as a background process on Linux. It is currently started at the command line in a Terminal window.

Recently a user was executing the application

相关标签:
14条回答
  • 2020-11-22 10:06

    The user has the ability to kill his own programs, using kill or Control+C, but I get the impression that's not what happened, and that the user complained to you.

    root has the ability to kill programs of course, but if someone has root on your machine and is killing stuff you have bigger problems.

    If you are not the sysadmin, the sysadmin may have set up quotas on CPU, RAM, ort disk usage and auto-kills processes that exceed them.

    Other than those guesses, I'm not sure without more info about the program.

    0 讨论(0)
  • 2020-11-22 10:11

    This is the Linux out of memory manager (OOM). Your process was selected due to 'badness' - a combination of recentness, resident size (memory in use, rather than just allocated) and other factors.

    sudo journalctl -xb
    

    You'll see a message like:

    Jul 20 11:05:00 someapp kernel: Mem-Info:
    Jul 20 11:05:00 someapp kernel: Node 0 DMA per-cpu:
    Jul 20 11:05:00 someapp kernel: CPU    0: hi:    0, btch:   1 usd:   0
    Jul 20 11:05:00 someapp kernel: Node 0 DMA32 per-cpu:
    Jul 20 11:05:00 someapp kernel: CPU    0: hi:  186, btch:  31 usd:  30
    Jul 20 11:05:00 someapp kernel: active_anon:206043 inactive_anon:6347 isolated_anon:0
                                        active_file:722 inactive_file:4126 isolated_file:0
                                        unevictable:0 dirty:5 writeback:0 unstable:0
                                        free:12202 slab_reclaimable:3849 slab_unreclaimable:14574
                                        mapped:792 shmem:12802 pagetables:1651 bounce:0
                                        free_cma:0
    Jul 20 11:05:00 someapp kernel: Node 0 DMA free:4576kB min:708kB low:884kB high:1060kB active_anon:10012kB inactive_anon:488kB active_file:4kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present
    Jul 20 11:05:00 someapp kernel: lowmem_reserve[]: 0 968 968 968
    Jul 20 11:05:00 someapp kernel: Node 0 DMA32 free:44232kB min:44344kB low:55428kB high:66516kB active_anon:814160kB inactive_anon:24900kB active_file:2884kB inactive_file:16500kB unevictable:0kB isolated(anon):0kB isolated
    Jul 20 11:05:00 someapp kernel: lowmem_reserve[]: 0 0 0 0
    Jul 20 11:05:00 someapp kernel: Node 0 DMA: 17*4kB (UEM) 22*8kB (UEM) 15*16kB (UEM) 12*32kB (UEM) 8*64kB (E) 9*128kB (UEM) 2*256kB (UE) 3*512kB (UM) 0*1024kB 0*2048kB 0*4096kB = 4580kB
    Jul 20 11:05:00 someapp kernel: Node 0 DMA32: 216*4kB (UE) 601*8kB (UE) 448*16kB (UE) 311*32kB (UEM) 135*64kB (UEM) 74*128kB (UEM) 5*256kB (EM) 0*512kB 0*1024kB 1*2048kB (R) 0*4096kB = 44232kB
    Jul 20 11:05:00 someapp kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
    Jul 20 11:05:00 someapp kernel: 17656 total pagecache pages
    Jul 20 11:05:00 someapp kernel: 0 pages in swap cache
    Jul 20 11:05:00 someapp kernel: Swap cache stats: add 0, delete 0, find 0/0
    Jul 20 11:05:00 someapp kernel: Free swap  = 0kB
    Jul 20 11:05:00 someapp kernel: Total swap = 0kB
    Jul 20 11:05:00 someapp kernel: 262141 pages RAM
    Jul 20 11:05:00 someapp kernel: 7645 pages reserved
    Jul 20 11:05:00 someapp kernel: 264073 pages shared
    Jul 20 11:05:00 someapp kernel: 240240 pages non-shared
    Jul 20 11:05:00 someapp kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
    Jul 20 11:05:00 someapp kernel: [  241]     0   241    13581     1610      26        0             0 systemd-journal
    Jul 20 11:05:00 someapp kernel: [  246]     0   246    10494      133      22        0         -1000 systemd-udevd
    Jul 20 11:05:00 someapp kernel: [  264]     0   264    29174      121      26        0         -1000 auditd
    Jul 20 11:05:00 someapp kernel: [  342]     0   342    94449      466      67        0             0 NetworkManager
    Jul 20 11:05:00 someapp kernel: [  346]     0   346   137495     3125      88        0             0 tuned
    Jul 20 11:05:00 someapp kernel: [  348]     0   348    79595      726      60        0             0 rsyslogd
    Jul 20 11:05:00 someapp kernel: [  353]    70   353     6986       72      19        0             0 avahi-daemon
    Jul 20 11:05:00 someapp kernel: [  362]    70   362     6986       58      18        0             0 avahi-daemon
    Jul 20 11:05:00 someapp kernel: [  378]     0   378     1621       25       8        0             0 iprinit
    Jul 20 11:05:00 someapp kernel: [  380]     0   380     1621       26       9        0             0 iprupdate
    Jul 20 11:05:00 someapp kernel: [  384]    81   384     6676      142      18        0          -900 dbus-daemon
    Jul 20 11:05:00 someapp kernel: [  385]     0   385     8671       83      21        0             0 systemd-logind
    Jul 20 11:05:00 someapp kernel: [  386]     0   386    31573      153      15        0             0 crond
    Jul 20 11:05:00 someapp kernel: [  391]   999   391   128531     2440      48        0             0 polkitd
    Jul 20 11:05:00 someapp kernel: [  400]     0   400     9781       23       8        0             0 iprdump
    Jul 20 11:05:00 someapp kernel: [  419]     0   419    27501       32      10        0             0 agetty
    Jul 20 11:05:00 someapp kernel: [  855]     0   855    22883      258      43        0             0 master
    Jul 20 11:05:00 someapp kernel: [  862]    89   862    22926      254      44        0             0 qmgr
    Jul 20 11:05:00 someapp kernel: [23631]     0 23631    20698      211      43        0         -1000 sshd
    Jul 20 11:05:00 someapp kernel: [12884]     0 12884    81885     3754      80        0             0 firewalld
    Jul 20 11:05:00 someapp kernel: [18130]     0 18130    33359      291      65        0             0 sshd
    Jul 20 11:05:00 someapp kernel: [18132]  1000 18132    33791      748      64        0             0 sshd
    Jul 20 11:05:00 someapp kernel: [18133]  1000 18133    28867      122      13        0             0 bash
    Jul 20 11:05:00 someapp kernel: [18428]    99 18428   208627    42909     151        0             0 node
    Jul 20 11:05:00 someapp kernel: [18486]    89 18486    22909      250      46        0             0 pickup
    Jul 20 11:05:00 someapp kernel: [18515]  1000 18515   352905   141851     470        0             0 npm
    Jul 20 11:05:00 someapp kernel: [18520]     0 18520    33359      291      66        0             0 sshd
    Jul 20 11:05:00 someapp kernel: [18522]  1000 18522    33359      294      64        0             0 sshd
    Jul 20 11:05:00 someapp kernel: [18523]  1000 18523    28866      115      12        0             0 bash
    Jul 20 11:05:00 someapp kernel: Out of memory: Kill process 18515 (npm) score 559 or sacrifice child
    Jul 20 11:05:00 someapp kernel: Killed process 18515 (npm) total-vm:1411620kB, anon-rss:567404kB, file-rss:0kB
    
    0 讨论(0)
  • 2020-11-22 10:12

    As dwc and Adam Jaskiewicz have stated, the culprit is likely the OOM Killer. However, the next question that follows is: How do I prevent this?

    There are several ways:

    1. Give your system more RAM if you can (easy if its a VM)
    2. Make sure the OOM killer chooses a different process.
    3. Disable the OOM Killer
    4. Choose a Linux distro which ships with the OOM Killer disabled.

    I found (2) to be especially easy to implement, thanks to this article.

    0 讨论(0)
  • 2020-11-22 10:17

    This looks like a good article on the subject: Taming the OOM killer.

    The gist is that Linux overcommits memory. When a process asks for more space, Linux will give it that space, even if it is claimed by another process, under the assumption that nobody actually uses all of the memory they ask for. The process will get exclusive use of the memory it has allocated when it actually uses it, not when it asks for it. This makes allocation quick, and might allow you to "cheat" and allocate more memory than you really have. However, once processes start using this memory, Linux might realize that it has been too generous in allocating memory it doesn't have, and will have to kill off a process to free some up. The process to be killed is based on a score taking into account runtime (long-running processes are safer), memory usage (greedy processes are less safe), and a few other factors, including a value you can adjust to make a process less likely to be killed. It's all described in the article in a lot more detail.

    Edit: And here is another article that explains pretty well how a process is chosen (annotated with some kernel code examples). The great thing about this is that it includes some commentary on the reasoning behind the various badness() rules.

    0 讨论(0)
  • 2020-11-22 10:18

    If the user or sysadmin did not kill the program the kernel may have. The kernel would only kill a process under exceptional circumstances such as extreme resource starvation (think mem+swap exhaustion).

    0 讨论(0)
  • 2020-11-22 10:20

    Try:

    dmesg -T| grep -E -i -B100 'killed process'
    

    Where -B100 signifies the number of lines before the kill happened.

    Omit -T on Mac OS.

    0 讨论(0)
提交回复
热议问题