Python script terminated by SIGKILL rather than throwing MemoryError

吃可爱长大的小学妹 提交于 2019-12-10 14:56:04

问题


Update Again

I have tried to create some simple way to reproduce this, but have not been successful.

So far, I have tried various simple array allocations and manipulations, but they all throw an MemoryError rather than just SIGKILL crashing.

For example:

x =np.asarray(range(999999999))

or:

x = np.empty([100,100,100,100,7])

just throw MemoryErrors as they should.

I hope to have a simple way to recreate this at some point.

End Update

I have a python script running numpy/scipy and some custom C extensions.

On my Ubuntu 14.04 under Virtual Box, it runs to completion just fine.

On an Amazon EC2 T2 micro instance, it terminates (after running a while) with the output:

Killed

Running under the python debugger, the signal is not caught and the debugger exits as well.

Running under strace, I get:

munmap(0x7fa5b7fa6000, 67112960)        = 0
mmap(NULL, 67112960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5b7fa6000    
mmap(NULL, 67112960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5affa4000    
mmap(NULL, 67112960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5abfa3000    
mmap(NULL, 67637248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5a7f22000    
mmap(NULL, 67637248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5a3ea1000    
mmap(NULL, 67637248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa59fe20000    
gettimeofday({1406518336, 306209}, NULL) = 0    
gettimeofday({1406518336, 580022}, NULL) = 0    
+++ killed by SIGKILL +++

running under gdb while trying to catch "SIGKILL", I get:

[Thread 0x7fffe7148700 (LWP 28022) exited]

Program terminated with signal SIGKILL, Killed.
The program no longer exists.
(gdb) where
No stack.

running python's trace module (python -m trace --trace ), I get:

defmatrix.py(292):         if (isinstance(obj, matrix) and obj._getitem): return
defmatrix.py(293):         ndim = self.ndim
defmatrix.py(294):         if (ndim == 2):
defmatrix.py(295):             return
defmatrix.py(336):         return out
 --- modulename: linalg, funcname: norm
linalg.py(2052):     x = asarray(x)
 --- modulename: numeric, funcname: asarray
numeric.py(460):     return array(a, dtype, copy=False, order=order)

I can't think of anything else at the moment to figure out what is going on.

I suspect maybe it might be running out of memory (it is an AWS Micro instance), but I can't figure out how to confirm or deny that.

Is there another tool I could use that might help pinpoint exactly where the program is stopping? (or I am running one of the above tools the wrong way for this problem?)

Update

The Amazon EC2 T2 micro instance has no swap space defined by default, so I added a 4GB swap file and was able to run the program to completion.

However, I am still very interested in a way to have run the program such that it terminated with some message a little closer to "Not Enough Memory" rather than "Killed"

If anyone has any suggestions, they would be appreciated.


回答1:


It sounds like you've run into the dreaded Linux OOM Killer. When the system completely runs of out of memory and the kernel absolutely needs to allocate memory, it kills a process rather than crashing the entire system.

Look in the syslog for confirmation of this. A line similar to:

kernel: [884145.344240] mysqld invoked oom-killer:

followed sometime later with:

kernel: [884145.344399] Out of memory: Kill process 3318

Should be present (in this example, it mentions mysql specifically)

You can add these lines to your /etc/sysctl.conf file to effectively disable the OOM killer:

vm.overcommit_memory = 2
vm.overcommit_ratio = 100

And then reboot. Now, the original, memory hungry, process should fail to allocate memory and, hopefully, throw the proper exception.

Setting overcommit_memory means that Linux won't over commit memory, meaning memory allocations will fail if there isn't enough memory for them. See this answer for details on what effect the overcommit_ratio has: https://serverfault.com/a/510857



来源:https://stackoverflow.com/questions/25000496/python-script-terminated-by-sigkill-rather-than-throwing-memoryerror

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!