My application runs as a background process on Linux. It is currently started at the command line in a Terminal window.
Recently a user was executing the application
I encountered this problem lately. Finally, I found my processes were killed just after Opensuse zypper update was called automatically. To disable zypper update solved my problem.
In an lsf environment (interactive or otherwise) if the application exceeds memory utilization beyond some preset threshold by the admins on the queue or the resource request in submit to the queue the processes will be killed so other users don't fall victim to a potential run away. It doesn't always send an email when it does so, depending on how its set up.
One solution in this case is to find a queue with larger resources or define larger resource requirements in the submission.
You may also want to review man ulimit
Although I don't remember ulimit
resulting in Killed
its been a while since I needed that.
Solved this issue by increasing swap size:
https://askubuntu.com/questions/1075505/how-do-i-increase-swapfile-in-ubuntu-18-04
Say you have 512 RAM + 1GB Swap memory. So in theory, your CPU has access to total of 1.5GB of virtual memory.
Now, for some time everything is running fine within 1.5GB of total memory. But all of sudden (or gradually) your system has started consuming more and more memory and it reached at a point around 95% of total memory used.
Now say any process has requested large chunck of memory from the kernel. Kernel check for the available memory and find that there is no way it can allocate your process more memory. So it will try to free some memory calling/invoking OOMKiller (http://linux-mm.org/OOM).
OOMKiller has its own algorithm to score the rank for every process. Typically which process uses more memory becomes the victim to be killed.
Typically in /var/log directory. Either /var/log/kern.log or /var/log/dmesg
Hope this will help you.
The PAM module to limit resources caused exactly the results you described: My process died mysteriously with the text Killed on the console window. No log output, neither in syslog nor in kern.log. The top program helped me to discover that exactly after one minute of CPU usage my process gets killed.
A tool like systemtap (or a tracer) can monitor kernel signal-transmission logic and report. e.g., https://sourceware.org/systemtap/examples/process/sigmon.stp
# stap .../sigmon.stp -x 31994 SIGKILL
SPID SNAME RPID RNAME SIGNUM SIGNAME
5609 bash 31994 find 9 SIGKILL
The filtering if
block in that script can be adjusted to taste, or eliminated to trace systemwide signal traffic. Causes can be further isolated by collecting backtraces (add a print_backtrace()
and/or print_ubacktrace()
to the probe, for kernel- and userspace- respectively).