How do I yield an object from a generator and forget it immediately, so that it doesn\'t take up memory?
For example, in the following function:
def grou
Several points were perplexing me in this thread. I realize that I was missing to understand the base: what was your problem.
Now I think that I've understood and I whish you to confirm.
I'll represent your code like that
import itertools
def grouper(iterable, chunksize):
i = iter(iterable)
while True:
chunk = list(itertools.islice(i, int(chunksize)))
if not chunk:
break
yield chunk
............
............
gigi = grouper(an_iterable,4)
# before A
# A = grouper(an_iterable,4)
# corrected:
A = gigi.next()
# after A
................
...........
# deducing an object x from A ; x doesn't consumes a lot of memory
............
# deleting A because it consumes a lot of memory:
del A
# code carries on, taking time to executes
................
................
......
..........
# before B
# B = grouper(an_iterable,4)
# corrected:
B = gigi.next()
# after B
.....................
........
Your problem is that even during the time elapsing between
# after deletion of A, code carries on, taking time to executes
and
# before B ,
the object of deleted name 'A' still exists and consumes a lot of memory because there is still a binding between this object and the identifier 'chunk' inside the generator function ?
Excuse me to ask you about this now evident point to me.
However, as there was a certain confusion in the thread at a time, I'd like you to confirm I have now correctly understood your problem.
.
You wrote in a comment:
1)
After theyield chunk
, there is no way to access the value stored in chunk from this function. Therefore, this function does not hold any references to the object in question
(By the way, I wouldn't have written therefore , but 'because')
I think that this affirmation #1 is debatable.
In fact , I'm convinced it is false. But there is a subtlety in what you pretend, not in this quotation alone, but globally, if we take account of what you say in the beginning of your answer too.
Let us take things in order.
The following code seems to prove the contrary of your affirmation "After the yield chunk, there is no way to access the value stored in chunk from this function."
import itertools
def grouper(iterable, chunksize):
i = iter(iterable)
chunk = ''
last = ''
while True:
print 'new turn ',id(chunk)
if chunk:
last = chunk[-1]
chunk = list(itertools.islice(i, int(chunksize)))
print 'new chunk ',id(chunk),' len of chunk :',len(chunk)
if not chunk:
break
yield '%s - %s' % (last,' , '.join(chunk))
print 'end of turn',id(chunk),'\n'
for x in grouper(['1','2','3','4','5','6','7','8','9','10','11'],'4'):
print repr(x)
result
new turn 10699768
new chunk 18747064 len of chunk : 4
' - 1 , 2 , 3 , 4'
end of turn 18747064
new turn 18747064
new chunk 18777312 len of chunk : 4
'4 - 5 , 6 , 7 , 8'
end of turn 18777312
new turn 18777312
new chunk 18776952 len of chunk : 3
'8 - 9 , 10 , 11'
end of turn 18776952
new turn 18776952
new chunk 18777512 len of chunk : 0
.
However, you also wrote (it's the beginning of your answer):
2)
Afteryield chunk
, the variable value is never used again in the function, so a good interpreter/garbage collector will already free chunk for garbage collection (note: cpython 2.7 seems not do this, pypy 1.6 with default gc does).
This time you don't say that the function hold no more reference of chunk after yield chunk
, you say that its value is not used again before the renewal of chunk in the next turn of the while
loop. That's right, in the Radim's code, the object chunk isn't used again before the identifier 'chunk' is re-assigned in the instruction chunk = list(itertools.islice(i, int(chunksize)))
in the next turn of the loop.
.
This affirmation #2 in this quotation, different from the preceding #1 one, has two logical consequences:
FIRST , my above code can't pretend to prove strictly to someone thinking like you that there is indeed a way to access the value of chunk after the yield chunk
instruction.
Because the conditions in my above code are not the same under which you affirm the contrary, that is to say: in Radim's code about which you speak, the object chunk is really not used again before the next turn.
And then , one can pretend that it's because of the use of chunk in my above code ( the instructions print 'end of turn',id(chunk),'\n'
, print 'new turn ',id(chunk)
and last = chunk[-1]
do use it ) that it happens that a reference to the object chunk is still hold after the yield chunk
.
SECONDLY, going further in the reasoning, gathering your two quotations leads to conclude that you think it's because chunk is no more used after the yield chunk
instruction in the Radim's code that no reference is maintained on it.
It's a matter of logic, IMO: the absence of reference to an object is the condition of its freeing, hence if you pretend that the memory is freed from the object because it is no more used, it's equivalent to pretend that the memory is freed from the object because its unemployment makes the intepreter to delete the reference to it in the function.
I sum up:
you pretend that in Radim's code, chunk is no more used after yield chunk
then no more reference to it is hold, then..... cpython 2.7 won't do it... but pypy 1.6 with default gc frees the memory from the object chunk.
At this point , I'm very surprised by the reasoning at the source of this consequence: it would be because of the fact that chunk is no more used that pypy 1.6 would free it. This reasoning isn't clearly expressed like that by you, but without it I would find what you claim in the two quotations being illogical and incomprehensible.
What perplexes me in this conclusion, and the reason I don't agree with all that, is that it implies that pypy 1.6 would be able to analyze the code and detect that chunk won't be used again after yield chunk
. I find this idea completely unbelievable and I would like you :
to explain what you exactly think about all that. Where am I wrong in the comprehension of your ideas ?
to say if you have a proof of the fact that , at least pypy 1.6, doesn't hold reference to chunk when it is no more used.
The problem of Radim's initial code is that the memory was too much consumed by the persistance of the object chunk because of its reference still hold inside the generator function: that was an indirect symptom of the existence of such a persistent reference inside.
Have you observed a similar behavior with pypy 1.6 ? I don't see another way to put in evidence the remaining reference inside the generator, since , according to your quotation #2, any use of chunk after yield chunk
is enough to trigger the upholding of a reference to it. It's a problem similar to one in quantic mechanics: the fact to measure the speed of a particle modifies its speed.....