What I need to accomplish:
Given a binary file, decode it in a couple different ways providing a TextIOBase
API. Ideally these subseque
EDIT:
I found a much better solution (comparatively), but I will leave this answer in the event it is useful for anyone to learn from. (It is a pretty easy way to show off gc.garbage
)
Please do not actually use what follows.
OLD:
I found a potential solution, though it is horrible:
What we can do is set up a cyclic reference in the destructor, which will hold off the GC event. We can then look at the garbage
of gc
to find these unreferenceable objects, break the cycle, and drop that reference.
In [1]: import io
In [2]: class MyTextIOWrapper(io.TextIOWrapper):
...: def __del__(self):
...: if not hasattr(self, '_cycle'):
...: print "holding off GC"
...: self._cycle = self
...: else:
...: print "getting GCed!"
...:
In [3]: def mangle(x):
...: MyTextIOWrapper(x)
...:
In [4]: f = io.open('example', mode='rb')
In [5]: mangle(f)
holding off GC
In [6]: f.closed
Out[6]: False
In [7]: import gc
In [8]: gc.garbage
Out[8]: []
In [9]: gc.collect()
Out[9]: 34
In [10]: gc.garbage
Out[10]: [<_io.TextIOWrapper name='example' encoding='UTF-8'>]
In [11]: gc.garbage[0]._cycle=False
In [12]: del gc.garbage[0]
getting GCed!
In [13]: f.closed
Out[13]: True
Truthfully this is a pretty horrific workaround, but it could be transparent to the API I am delivering. Still I would prefer a way to override the __del__
of IOBase
.
Just detach your TextIOWrapper()
object before letting it be garbage collected:
def mangle(x):
wrapper = io.TextIOWrapper(x)
wrapper.detach()
The TextIOWrapper()
object only closes streams it is attached to. If you can't alter the code where the object goes out of scope, then simply keep a reference to the TextIOWrapper()
object locally and detach at that point.
If you must subclass TextIOWrapper()
, then just call detach()
in the __del__
hook:
class DetachingTextIOWrapper(io.TextIOWrapper):
def __del__(self):
self.detach()
A simple solution would be to return the variable from the function and store it in script scope, so that it does not get garbage collected until the script ends or the reference to it changes. But there may be other elegant solutions out there.
EDIT:
Just call detach first, thanks martijn-pieters!
It turns out there is basically nothing that can be done about the deconstructor calling close
in Python 2.7. This is hardcoded into the C code. Instead we can modify close
such that it won't close the buffer when __del__
is happening (__del__
will be executed before _PyIOBase_finalize
in the C code giving us a chance to change the behaviour of close
). This lets close
work as expected without letting the GC close the buffer.
class SaneTextIOWrapper(io.TextIOWrapper):
def __init__(self, *args, **kwargs):
self._should_close_buffer = True
super(SaneTextIOWrapper, self).__init__(*args, **kwargs)
def __del__(self):
# Accept the inevitability of the buffer being closed by the destructor
# because of this line in Python 2.7:
# https://github.com/python/cpython/blob/2.7/Modules/_io/iobase.c#L221
self._should_close_buffer = False
self.close() # Actually close for Python 3 because it is an override.
# We can't call super because Python 2 doesn't actually
# have a `__del__` method for IOBase (hence this
# workaround). Close is idempotent so it won't matter
# that Python 2 will end up calling this twice
def close(self):
# We can't stop Python 2.7 from calling close in the deconstructor
# so instead we can prevent the buffer from being closed with a flag.
# Based on:
# https://github.com/python/cpython/blob/2.7/Lib/_pyio.py#L1586
# https://github.com/python/cpython/blob/3.4/Lib/_pyio.py#L1615
if self.buffer is not None and not self.closed:
try:
self.flush()
finally:
if self._should_close_buffer:
self.buffer.close()
My previous solution here used _pyio.TextIOWrapper
which is slower than the above because it is written in Python, not C.
It involved simply overriding __del__
with a noop which will also work in Py2/3.