I was doing a little experimentation with 2D lists and numpy arrays. From this, I\'ve raised 3 questions I\'m quite curious to know the answer for.
First, I initiali
You have three questions:
__xx__
method has numpy overridden/defined to handle fancy indexing?The indexing operator []
is overridable using __getitem__
, __setitem__
, and __delitem__
. It can be fun to write a simple subclass that offers some introspection:
>>> class VerboseList(list):
... def __getitem__(self, key):
... print(key)
... return super().__getitem__(key)
...
Let's make an empty one first:
>>> l = VerboseList()
Now fill it with some values. Note that we haven't overridden __setitem__
so nothing interesting happens yet:
>>> l[:] = range(10)
Now let's get an item. At index 0
will be 0
:
>>> l[0]
0
0
If we try to use a tuple, we get an error, but we get to see the tuple first!
>>> l[0, 4]
(0, 4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 4, in __getitem__
TypeError: list indices must be integers or slices, not tuple
We can also find out how python represents slices internally:
>>> l[1:3]
slice(1, 3, None)
[1, 2]
There are lots more fun things you can do with this object -- give it a try!
This is hard to answer. One way of thinking about it is historical: because the numpy
developers thought of it first.
Upon its first public release in 1991, Python had no numpy
library, and to make a multi-dimensional list, you had to nest list structures. I assume that the early developers -- in particular, Guido van Rossum (GvR) -- felt that keeping things simple was best, initially. Slice indexing was already pretty powerful.
However, not too long after, interest grew in using Python as a scientific computing language. Between 1995 and 1997, a number of developers collaborated on a library called numeric
, an early predecessor of numpy
. Though he wasn't a major contributor to numeric
or numpy
, GvR coordinated with the numeric
developers, extending Python's slice syntax in ways that made multidimensional array indexing easier. Later, an alternative to numeric
arose called numarray
; and in 2006, numpy
was created, incorporating the best features of both.
These libraries were powerful, but they required heavy c extensions and so on. Working them into the base Python distribution would have made it bulky. And although GvR did enhance slice syntax a bit, adding fancy indexing to ordinary lists would have changed their API dramatically -- and somewhat redundantly. Given that fancy indexing could be had with an outside library already, the benefit wasn't worth the cost.
Parts of this narrative are speculative, in all honesty.1 I don't know the developers really! But it's the same decision I would have made. In fact...
Although fancy indexing is very powerful, I'm glad it's not part of vanilla Python even today, because it means that you don't have to think very hard when working with ordinary lists. For many tasks you don't need it, and the cognitive load it imposes is significant.
Keep in mind that I'm talking about the load imposed on readers and maintainers. You may be a whiz-bang genius who can do 5-d tensor products in your head, but other people have to read your code. Keeping fancy indexing in numpy
means people don't use it unless they honestly need it, which makes code more readable and maintainable in general.
Possibly. It's definitely environment-dependent; I don't see the same difference on my machine.
1. The parts of the narrative that aren't as speculative are drawn from a brief history told in a special issue of Computing in Science and Engineering (2011 vol. 13).
my_list[:,]
is translated by the interpreter into
my_list.__getitem__((slice(None, None, None),))
It's like calling a function with *args
, but it takes care of translating the :
notation into a slice
object. Without the ,
it would just pass the slice
. With the ,
it passes a tuple.
The list __getitem__
does not accept a tuple, as shown by the error. An array __getitem__
does. I believe the ability to pass a tuple and create slice objects was added as convenience for numpy
(or its predicessors). The tuple notation has never been added to the list __getitem__
. (There is an operator.itemgetter
class that allows a form of advanced indexing, but internally it is just a Python code iterator.)
With an array you can use the tuple notation directly:
In [490]: np.arange(6).reshape((2,3))[:,[0,1]]
Out[490]:
array([[0, 1],
[3, 4]])
In [491]: np.arange(6).reshape((2,3))[(slice(None),[0,1])]
Out[491]:
array([[0, 1],
[3, 4]])
In [492]: np.arange(6).reshape((2,3)).__getitem__((slice(None),[0,1]))
Out[492]:
array([[0, 1],
[3, 4]])
Look at the numpy/lib/index_tricks.py
file for example of fun stuff you can do with __getitem__
. You can view the file with
np.source(np.lib.index_tricks)
In a nested list, the sublists are independent of the containing list. The container just has pointers to objects elsewhere in memory:
In [494]: my_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
In [495]: my_list
Out[495]: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
In [496]: len(my_list)
Out[496]: 3
In [497]: my_list[1]
Out[497]: [4, 5, 6]
In [498]: type(my_list[1])
Out[498]: list
In [499]: my_list[1]='astring'
In [500]: my_list
Out[500]: [[1, 2, 3], 'astring', [7, 8, 9]]
Here I change the 2nd item of my_list
; it is no longer a list, but a string.
If I apply [:]
to a list I just get a shallow copy:
In [501]: xlist = my_list[:]
In [502]: xlist[1] = 43
In [503]: my_list # didn't change my_list
Out[503]: [[1, 2, 3], 'astring', [7, 8, 9]]
In [504]: xlist
Out[504]: [[1, 2, 3], 43, [7, 8, 9]]
but changing an element of a list in xlist
does change the corresponding sublist in my_list
:
In [505]: xlist[0][1]=43
In [506]: my_list
Out[506]: [[1, 43, 3], 'astring', [7, 8, 9]]
To me this shows by n-dimensional indexing (as implemented for numpy arrays) doesn't make sense with nested lists. Nested lists are multidimensional only to the extent that their contents allow; there's nothing structural or syntactically multidimensional about them.
Using two [:]
on a list does not make a deep copy or work its way down the nesting. It just repeats the shallow copy step:
In [507]: ylist=my_list[:][:]
In [508]: ylist[0][1]='boo'
In [509]: xlist
Out[509]: [[1, 'boo', 3], 43, [7, 8, 9]]
arr[:,]
just makes a view
of arr
. The difference between view
and copy
is part of understanding the difference between basic and advanced indexing.
So alist[:][:]
and arr[:,]
are different, but basic ways of making some sort of copy of lists and arrays. Neither computes anything, and neither iterates through the elements. So a timing comparison doesn't tell us much.
Which
__xx__
method has numpy overridden/defined to handle fancy indexing?
__getitem__
for retrieval, __setitem__
for assignment. It'd be __delitem__
for deletion, except that NumPy arrays don't support deletion.
(It's all written in C, though, so what they implemented at C level was mp_subscript
and mp_ass_subscript
, and __getitem__
and __setitem__
wrappers were provided by PyType_Ready
. __delitem__
too, even though deletion is unsupported, because __setitem__
and __delitem__
both map to mp_ass_subscript
at C level.)
Why don't python lists natively support fancy indexing?
Python lists are fundamentally 1-dimensional structures, while NumPy arrays are arbitrary-dimensional. Multidimensional indexing only makes sense for multidimensional data structures.
You can have a list with lists as elements, like [[1, 2], [3, 4]]
, but the list doesn't know or care about the structure of its elements. Making lists support l[:, 2]
indexing would require the list to be aware of multidimensional structure in a way that lists aren't designed to be. It would also add a lot of complexity, a lot of error handling, and a lot of extra design decisions - how deep a copy should l[:, :]
be? What happens if the structure is ragged, or inconsistently nested? Should multidimensional indexing recurse into non-list elements? What would del l[1:3, 1:3]
do?
I've seen the NumPy indexing implementation, and it's longer than the entire implementation of lists. Here's part of it. It's not worth doing that to lists when NumPy arrays satisfy all the really compelling use cases you'd need it for.
Why is numpy's fancy indexing so slow on python2? Is it because I don't have native BLAS support for numpy in this version?
NumPy indexing isn't a BLAS operation, so that's not it. I can't reproduce such dramatic timing differences, and the differences I do see look like minor Python 3 optimizations, maybe slightly more efficient allocation of tuples or slices. What you're seeing is probably due to NumPy version differences.