array-indexing

numpy indexing: shouldn't trailing Ellipsis be redundant?

耗尽温柔 — Mon, 30 Dec 2019 20:56:07 +0000

numpy indexing: shouldn't trailing Ellipsis be redundant? 耗尽温柔 2019-12-31 04:56:07

问题

While trying to properly understand numpy indexing rules I stumbled across the following. I used to think that a trailing Ellipsis in an index does nothing. Trivial isn't it? Except, it's not actually true:

Python 3.5.2 (default, Nov 11 2016, 04:18:53) 
[GCC 4.8.5] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> 
>>> D2 = np.arange(4).reshape((2, 2))
>>>
>>> D2[[1, 0]].shape; D2[[1, 0], ...].shape
(2, 2)
(2, 2)
>>> D2[:, [1, 0]].shape; D2[:, [1, 0], ...].shape
(2, 2)
(2, 2)
>>> # so far so expected; now
... 
>>> D2[[[1, 0]]].shape; D2[[[1, 0]], ...].shape
(2, 2)
(1, 2, 2)
>>> # ouch!
...
>>> D2[:, [[1, 0]]].shape; D2[:, [[1, 0]], ...].shape
(2, 1, 2)
(2, 1, 2)

Now could someone in the know advise me as to whether this is a bug or a feature? And if the latter, what's the rationale?

Thanks in advance, Paul

回答1:

Evidently there's some ambiguity in the interpretation of the [[1, 0]] index. Possibly the same thing discussed here:

Advanced slicing when passed list instead of tuple in numpy

I'll try a different array, to see if it makes things any clear

In [312]: D2=np.array([[0,0],[1,1],[2,2]])
In [313]: D2
Out[313]: 
array([[0, 0],
       [1, 1],
       [2, 2]])

In [316]: D2[[[1,0,0]]]
Out[316]: 
array([[1, 1],
       [0, 0],
       [0, 0]])
In [317]: _.shape
Out[317]: (3, 2)

Use of : or ... or making the index list an array, all treat it as a (1,3) index, and expand the dimensions of the result accordingly

In [318]: D2[[[1,0,0]],:]
Out[318]: 
array([[[1, 1],
        [0, 0],
        [0, 0]]])
In [319]: _.shape
Out[319]: (1, 3, 2)
In [320]: D2[np.array([[1,0,0]])]
Out[320]: 
array([[[1, 1],
        [0, 0],
        [0, 0]]])
In [321]: _.shape
Out[321]: (1, 3, 2)

Note that if I apply transpose to the indexing array I get a (3,1,2) result

In [323]: D2[np.array([[1,0,0]]).T,:]
...
In [324]: _.shape
Out[324]: (3, 1, 2)

Without : or ..., it appears to strip off one layer of [] before applying it to the 1st axis:

In [330]: D2[[1,0,0]].shape
Out[330]: (3, 2)
In [331]: D2[[[1,0,0]]].shape
Out[331]: (3, 2)
In [333]: D2[[[[1,0,0]]]].shape
Out[333]: (1, 3, 2)
In [334]: D2[[[[[1,0,0]]]]].shape
Out[334]: (1, 1, 3, 2)
In [335]: D2[np.array([[[[1,0,0]]]])].shape
Out[335]: (1, 1, 1, 3, 2)

I think there's a backward compatibility issue here. We know that the tuple layer is 'redundant': D2[(1,2)] is the same as D2[1,2]. But for compatibility for early versions of numpy (numeric) that first [] layer may be treated in the same way.

In that November question, I noted:

So at a top level a list and tuple are treated the same - if the list can't interpreted as an advanced indexing list.

The addition of a ... is another way of separating the D2[[[0,1]]] from D2[([0,1],)].

From @eric/s pull request seburg explains

The tuple normalization is a rather small thing (it basically checks for a non-array sequence of length <= np.MAXDIMS, and if it contains another sequence, slice or None consider it a tuple).

[[1,2]] is a 1 element list with a list, so it is considered a tuple, i.e. ([1,2],). [[1,2]],... is a tuple already.

来源：https://stackoverflow.com/questions/41233678/numpy-indexing-shouldnt-trailing-ellipsis-be-redundant

标签

python

numpy

array-indexing

Using python range objects to index into numpy arrays

前提是你 — Thu, 12 Dec 2019 10:16:43 +0000

Using python range objects to index into numpy arrays 前提是你 2019-12-12 18:16:43

问题

I've seen it once or twice before, but I can't seem to find any official docs on it: Using python range objects as indices in numpy.

import numpy as np
a = np.arange(9).reshape(3,3)
a[range(3), range(2,-1,-1)]
# array([2, 4, 6])

Let's trigger an index error just to confirm that ranges are not in the official range (pun intended) of legal indexing methods:

a['x']

# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

Now, a slight divergence between numpy and its docs is not entirely unheard of and does not necessarily indicate that a feature is not intended (see for example here).

So, does anybody know why this works at all? And if it is an intended feature what are the exact semantics / what is it good for? And are there any ND generalizations?

回答1:

Not a proper answer, but too long for comment.

In fact, it seems to work with about any indexable object:

import numpy as np

class MyIndex:
    def __init__(self, n):
        self.n = n
    def __getitem__(self, i):
        if i < 0 or i >= self.n:
            raise IndexError
        return i
    def __len__(self):
        return self.n

a = np.array([1, 2, 3])
print(a[MyIndex(2)])
# [1 2]

I think the relevant lines in NumPy's code are below this comment in core/src/multiarray/mapping.c:

/*
 * Some other type of short sequence - assume we should unpack it like a
 * tuple, and then decide whether that was actually necessary.
 */

But I'm not entirely sure. For some reason, this hangs if you remove the if i < 0 or i >= self.n: raise IndexError, even though there is a __len__, so at some point it seems to be iterating through the given object until IndexError is raised.

回答2:

Just to wrap this up (thanks to @WarrenWeckesser in the comments): This behavior is actually documented. One only has to realize that range objects are python sequences in the strict sense.

So this is just a case of fancy indexing. Be warned, though, that it is very slow:

>>> a = np.arange(100000)
>>> timeit(lambda: a[range(100000)], number=1000)
12.969507368048653
>>> timeit(lambda: a[list(range(100000))], number=1000)
7.990526253008284
>>> timeit(lambda: a[np.arange(100000)], number=1000)
0.22483703796751797

来源：https://stackoverflow.com/questions/53123009/using-python-range-objects-to-index-into-numpy-arrays

标签

python

numpy

array-indexing

numpy indexing: shouldn't trailing Ellipsis be redundant?

喜夏-厌秋 — Sun, 01 Dec 2019 22:23:21 +0000

numpy indexing: shouldn't trailing Ellipsis be redundant? 喜夏-厌秋 2019-12-02 06:23:21

Python 3.5.2 (default, Nov 11 2016, 04:18:53) 
[GCC 4.8.5] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> 
>>> D2 = np.arange(4).reshape((2, 2))
>>>
>>> D2[[1, 0]].shape; D2[[1, 0], ...].shape
(2, 2)
(2, 2)
>>> D2[:, [1, 0]].shape; D2[:, [1, 0], ...].shape
(2, 2)
(2, 2)
>>> # so far so expected; now
... 
>>> D2[[[1, 0]]].shape; D2[[[1, 0]], ...].shape
(2, 2)
(1, 2, 2)
>>> # ouch!
...
>>> D2[:, [[1, 0]]].shape; D2[:, [[1, 0]], ...].shape
(2, 1, 2)
(2, 1, 2)

Now could someone in the know advise me as to whether this is a bug or a feature? And if the latter, what's the rationale?

Thanks in advance, Paul

hpaulj

Evidently there's some ambiguity in the interpretation of the [[1, 0]] index. Possibly the same thing discussed here:

Advanced slicing when passed list instead of tuple in numpy

I'll try a different array, to see if it makes things any clear

In [312]: D2=np.array([[0,0],[1,1],[2,2]])
In [313]: D2
Out[313]: 
array([[0, 0],
       [1, 1],
       [2, 2]])

In [316]: D2[[[1,0,0]]]
Out[316]: 
array([[1, 1],
       [0, 0],
       [0, 0]])
In [317]: _.shape
Out[317]: (3, 2)

Use of : or ... or making the index list an array, all treat it as a (1,3) index, and expand the dimensions of the result accordingly

In [318]: D2[[[1,0,0]],:]
Out[318]: 
array([[[1, 1],
        [0, 0],
        [0, 0]]])
In [319]: _.shape
Out[319]: (1, 3, 2)
In [320]: D2[np.array([[1,0,0]])]
Out[320]: 
array([[[1, 1],
        [0, 0],
        [0, 0]]])
In [321]: _.shape
Out[321]: (1, 3, 2)

Note that if I apply transpose to the indexing array I get a (3,1,2) result

In [323]: D2[np.array([[1,0,0]]).T,:]
...
In [324]: _.shape
Out[324]: (3, 1, 2)

Without : or ..., it appears to strip off one layer of [] before applying it to the 1st axis:

In [330]: D2[[1,0,0]].shape
Out[330]: (3, 2)
In [331]: D2[[[1,0,0]]].shape
Out[331]: (3, 2)
In [333]: D2[[[[1,0,0]]]].shape
Out[333]: (1, 3, 2)
In [334]: D2[[[[[1,0,0]]]]].shape
Out[334]: (1, 1, 3, 2)
In [335]: D2[np.array([[[[1,0,0]]]])].shape
Out[335]: (1, 1, 1, 3, 2)

In that November question, I noted:

So at a top level a list and tuple are treated the same - if the list can't interpreted as an advanced indexing list.

The addition of a ... is another way of separating the D2[[[0,1]]] from D2[([0,1],)].

From @eric/s pull request seburg explains

The tuple normalization is a rather small thing (it basically checks for a non-array sequence of length <= np.MAXDIMS, and if it contains another sequence, slice or None consider it a tuple).

[[1,2]] is a 1 element list with a list, so it is considered a tuple, i.e. ([1,2],). [[1,2]],... is a tuple already.

来源：https://stackoverflow.com/questions/41233678/numpy-indexing-shouldnt-trailing-ellipsis-be-redundant

标签

python

numpy

array-indexing

Do pointers support “array style indexing”?

喜夏-厌秋 — Tue, 26 Nov 2019 19:40:56 +0000

Do pointers support “array style indexing”? 喜夏-厌秋 2019-11-27 03:40:56

问题

(Self-answered Q&A - this matter keeps popping up)

I assume that the reader is aware of how pointer arithmetic works.

int arr[3] = {1,2,3};
int* ptr = arr;
...
*(ptr + i) = value;

Teachers/C books keep telling me I shouldn't use *(ptr + i) like in the above example, because "pointers support array style indexing" and I should be using ptr[i] = value; instead. No argument there - much easier to read.

But looking through the C standard, I find nothing called "array style indexing". In fact, the operator [] is not expecting the either operand to be an array, but instead a pointer or an integer!

6.5.2.1 Array subscripting

Constraints

One of the expressions shall have type ‘‘pointer to complete object type’’, the other expression shall have integer type, and the result has type ‘‘type’’.

Why does the array subscripting operator not expect an array? Is the standard wrong? Is my teacher/C book confused?

回答1:

You should indeed be using ptr[i] over *(ptr + i) for readability reasons. But apart from that, the [] operator is, strictly speaking, actually never used with an array operand.

Arrays, when used in an expression, always "decay" into a pointer to the first element (with some exceptions). C17 6.3.2.1/3, emphasis mine:

Except when it is the operand of the sizeof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue.

Meaning that whenever you type arr[i], the operand arr gets replaced by a pointer to the first element inside that array. This is informally referred to as "array decaying". More info here: What is array decaying?

So whenever you use the [] operator, you use it on a pointer. Always.

The C standard says that this operator is guaranteed to be equivalent to the pointer arithmetic (C17 6.5.2.1/2):

The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).

So whenever we type arr[i], it actually gets silently replaced by *(arr+i). Where arr is still a pointer to the first element.

And this is why the description you quoted tells you that either operand could be a pointer and the other an integer. Because obviously it doesn't matter if we type *(arr+i) or *(i+arr) - that's equivalent code.

Which in turn allows us to write obfuscated "joke" code like i[arr], which is actually valid C and fully equivalent to arr[i]. But don't write such code in real applications.

来源：https://stackoverflow.com/questions/55747822/do-pointers-support-array-style-indexing

标签