问题
Update Again
I have tried to create some simple way to reproduce this, but have not been successful.
So far, I have tried various simple array allocations and manipulations, but they all throw an MemoryError rather than just SIGKILL crashing.
For example:
x =np.asarray(range(999999999))
or:
x = np.empty([100,100,100,100,7])
just throw MemoryErrors as they should.
I hope to have a simple way to recreate this at some point.
End Update
I have a python script running numpy/scipy and some custom C extensions.
On my Ubuntu 14.04 under Virtual Box, it runs to completion just fine.
On an Amazon EC2 T2 micro instance, it terminates (after running a while) with the output:
Killed
Running under the python debugger, the signal is not caught and the debugger exits as well.
Running under strace, I get:
munmap(0x7fa5b7fa6000, 67112960) = 0
mmap(NULL, 67112960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5b7fa6000
mmap(NULL, 67112960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5affa4000
mmap(NULL, 67112960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5abfa3000
mmap(NULL, 67637248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5a7f22000
mmap(NULL, 67637248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5a3ea1000
mmap(NULL, 67637248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa59fe20000
gettimeofday({1406518336, 306209}, NULL) = 0
gettimeofday({1406518336, 580022}, NULL) = 0
+++ killed by SIGKILL +++
running under gdb while trying to catch "SIGKILL", I get:
[Thread 0x7fffe7148700 (LWP 28022) exited]
Program terminated with signal SIGKILL, Killed.
The program no longer exists.
(gdb) where
No stack.
running python's trace module (python -m trace --trace ), I get:
defmatrix.py(292): if (isinstance(obj, matrix) and obj._getitem): return
defmatrix.py(293): ndim = self.ndim
defmatrix.py(294): if (ndim == 2):
defmatrix.py(295): return
defmatrix.py(336): return out
--- modulename: linalg, funcname: norm
linalg.py(2052): x = asarray(x)
--- modulename: numeric, funcname: asarray
numeric.py(460): return array(a, dtype, copy=False, order=order)
I can't think of anything else at the moment to figure out what is going on.
I suspect maybe it might be running out of memory (it is an AWS Micro instance), but I can't figure out how to confirm or deny that.
Is there another tool I could use that might help pinpoint exactly where the program is stopping? (or I am running one of the above tools the wrong way for this problem?)
Update
The Amazon EC2 T2 micro instance has no swap space defined by default, so I added a 4GB swap file and was able to run the program to completion.
However, I am still very interested in a way to have run the program such that it terminated with some message a little closer to "Not Enough Memory" rather than "Killed"
If anyone has any suggestions, they would be appreciated.
回答1:
It sounds like you've run into the dreaded Linux OOM Killer. When the system completely runs of out of memory and the kernel absolutely needs to allocate memory, it kills a process rather than crashing the entire system.
Look in the syslog for confirmation of this. A line similar to:
kernel: [884145.344240] mysqld invoked oom-killer:
followed sometime later with:
kernel: [884145.344399] Out of memory: Kill process 3318
Should be present (in this example, it mentions mysql specifically)
You can add these lines to your /etc/sysctl.conf
file to effectively disable the OOM killer:
vm.overcommit_memory = 2
vm.overcommit_ratio = 100
And then reboot. Now, the original, memory hungry, process should fail to allocate memory and, hopefully, throw the proper exception.
Setting overcommit_memory
means that Linux won't over commit memory, meaning memory allocations will fail if there isn't enough memory for them. See this answer for details on what effect the overcommit_ratio
has: https://serverfault.com/a/510857
来源:https://stackoverflow.com/questions/25000496/python-script-terminated-by-sigkill-rather-than-throwing-memoryerror