Understanding == applied to a NumPy array

后端未结

关注

 2  1505

I\'m new to Python, and I am learning TensorFlow. In a tutorial using the notMNIST dataset, they give example code to transform the labels matrix

相关标签:

2条回答

南笙

2020-12-08 16:30
There are a few things going on here: numpy's vector ops, adding a singleton axis, and broadcasting.

First, you should be able to see how the == does the magic.

Let's say we start with a simple label array. == behaves in a vectorized fashion, which means that we can compare the entire array with a scalar and get an array consisting of the values of each elementwise comparison. For example:
```
>>> labels = np.array([1,2,0,0,2])
>>> labels == 0
array([False, False,  True,  True, False], dtype=bool)
>>> (labels == 0).astype(np.float32)
array([ 0.,  0.,  1.,  1.,  0.], dtype=float32)
```
First we get a boolean array, and then we coerce to floats: False==0 in Python, and True==1. So we wind up with an array which is 0 where labels isn't equal to 0 and 1 where it is.

But there's nothing special about comparing to 0, we could compare to 1 or 2 or 3 instead for similar results:
```
>>> (labels == 2).astype(np.float32)
array([ 0.,  1.,  0.,  0.,  1.], dtype=float32)
```
In fact, we could loop over every possible label and generate this array. We could use a listcomp:
```
>>> np.array([(labels == i).astype(np.float32) for i in np.arange(3)])
array([[ 0.,  0.,  1.,  1.,  0.],
       [ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  1.]], dtype=float32)
```
but this doesn't really take advantage of numpy. What we want to do is have each possible label compared with each element, IOW to compare
```
>>> np.arange(3)
array([0, 1, 2])
```
with
```
>>> labels
array([1, 2, 0, 0, 2])
```
And here's where the magic of numpy broadcasting comes in. Right now, labels is a 1-dimensional object of shape (5,). If we make it a 2-dimensional object of shape (5,1), then the operation will "broadcast" over the last axis and we'll get an output of shape (5,3) with the results of comparing each entry in the range with each element of labels.

First we can add an "extra" axis to labels using None (or np.newaxis), changing its shape:
```
>>> labels[:,None]
array([[1],
       [2],
       [0],
       [0],
       [2]])
>>> labels[:,None].shape
(5, 1)
```
And then we can make the comparison (this is the transpose of the arrangement we were looking at earlier, but that doesn't really matter).
```
>>> np.arange(3) == labels[:,None]
array([[False,  True, False],
       [False, False,  True],
       [ True, False, False],
       [ True, False, False],
       [False, False,  True]], dtype=bool)
>>> (np.arange(3) == labels[:,None]).astype(np.float32)
array([[ 0.,  1.,  0.],
       [ 0.,  0.,  1.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 0.,  0.,  1.]], dtype=float32)
```
Broadcasting in numpy is very powerful, and well worth reading up on.
0 讨论(0)
发布评论:

提交评论
- 加载中...
后悔当初

2020-12-08 16:45
In short, == applied to a numpy array means applying element-wise == to the array. The result is an array of booleans. Here is an example:
```
>>> b = np.array([1,0,0,1,1,0])
>>> b == 1
array([ True, False, False,  True,  True, False], dtype=bool)
```
To count say how many 1s there are in b, you don't need to cast the array to float, i.e. the .astype(np.float32) can be saved, because in python boolean is a subclass of int and in Python 3 you have True == 1 False == 0. So here is how you count how many ones is in b:
```
>>> np.sum((b == 1))
3
```
Or:
```
>>> np.count_nonzero(b == 1)
3
```
0 讨论(0)
发布评论:

提交评论
- 加载中...