Almost every tutorial and SO answer on this topic insists that you should never modify a list while iterating over it, but I can\'t see why this is such a bad thing if the c
while len(mylist) > 0:
print mylist.pop()
You are not iterating over the list. You are each time checking an atomic condition.
Also:
while len(mylist) > 0:
can be rewritten as:
while len(mylist):
which can be rewritten as:
while mylist:
You code doesn't iterate over the list.
for i in mylist:
print mylist.pop()
I'll go into a little bit more detail why you shouldn't iterate over a list. Naturally, by that I mean
for elt in my_list:
my_list.pop()
or similar idioms.
First, we need to think about what Python's for
loop does. Since you can attempt to iterate over any object, Python doesn't necessarily know how to iterate over whatever you've given it. So there is a list (heh) of things it tries to do to work out how to present the values one-by-one. And the first thing it does is checks for an __iter__
method on the object and -- if it exists -- calls it.
The result of this call will then be an iterable object; that is, one with a next
method. Now we're good to go: just call next
repeatedly until StopIteration
is raised.
Why is this important? Well, because the __iter__
method has actually to look at the data structure to find the values, and remember some internal state so that it knows where to look next. But if you change the data structure then __iter__
has no way of knowing that you've been fiddling, so it will blithely keep on trying to grab new data. What this means in practise is that you will probably skip elements of the list.
It's always nice to justify this sort of claim with a look at the source code. From listobject.c
:
static PyObject *
listiter_next(listiterobject *it)
{
PyListObject *seq;
PyObject *item;
assert(it != NULL);
seq = it->it_seq;
if (seq == NULL)
return NULL;
assert(PyList_Check(seq));
if (it->it_index < PyList_GET_SIZE(seq)) {
item = PyList_GET_ITEM(seq, it->it_index);
++it->it_index;
Py_INCREF(item);
return item;
}
Py_DECREF(seq);
it->it_seq = NULL;
return NULL;
}
Note in particular that it really does simulate a C-style for
loop, with it->it_index
playing the part of the index variable. In particular, if you delete an item from the list then you won't update it_index
, so you may skip a value.
The reason to why you should never modify a list while iterating over it is for example, you're iterating over a list of 20 digits, and if you hit an even number you pop it off the list and carry on till you have a list of just odd numbers.
Now, say this is your sample data [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
, and you start iterating over it. First iteration, and the number is 1
so you continue, the following number is 2
so you pop it off, and rinse and repeat. You now feel the application worked correctly as the resultant list is [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
.
Now let's say your sample data is [1, 2, 4, 5, 7, 8, 10, 11, 12, 13, 15, 15, 17, 18, 20]
and you run the same piece of code as before and mutate the original list while iterating through it. Your resultant list is [1, 4, 5, 7, 10, 11, 13, 15, 15, 17, 20]
which is clearly incorrect as there are still even numbers contained in the list.
If you are planning on mutating the list while iterating through it like so
for elem in lst:
# mutate list in place
You should rather change it to
for elem in lst[:]:
# mutate list in place
The [:]
syntax creates a new list that is an exact copy of the original list, so that you can happily mutate the original list without affecting what you're processing as you won't have any unintended side-effects from mutating the list you're iterating through.
If your list is rather sizable, instead of creating a new list and stepping through it look at using generator expressions or write your own generator for your list if you feel the need so that you do not waste memory and CPU cycles.