How to make Numpy treat each row/tensor as a value

问题

Many functions like in1d and setdiff1d are designed for 1-d array. One workaround to apply these methods on N-dimensional arrays is to make numpy to treat each row (something more high dimensional) as a value.

One approach I found to do so is in this answer Get intersecting rows across two 2D numpy arrays by Joe Kington.

The following code is taken from this answer. The task Joe Kington faced was to detect common rows in two arrays A and B while trying to use in1d.

import numpy as np
A = np.array([[1,4],[2,5],[3,6]])
B = np.array([[1,4],[3,6],[7,8]])

nrows, ncols = A.shape
dtype={'names':['f{}'.format(i) for i in range(ncols)],
       'formats':ncols * [A.dtype]}

C = np.intersect1d(A.view(dtype), B.view(dtype))

# This last bit is optional if you're okay with "C" being a structured array...
C = C.view(A.dtype).reshape(-1, ncols)

I am hoping you to help me with any of the following three questions. First, I do not understand the mechanisms behind this method. Can you try to explain it to me?

Second, is there other ways to let numpy treat an subarray as one object?

One more open question: dose Joe's approach have any drawbacks? I mean whether treating rows as a value might cause some problems? Sorry this question is pretty broad.

回答1:

Try to post what I have learned. The method Joe used is called structured arrays. It will allow users to define what is contained in a single cell/element.

We take a look at the description of the first example the documentation provided.

x = np.array([(1,2.,'Hello'), (2,3.,"World")], ...  
              dtype=[('foo', 'i4'),('bar', 'f4'), ('baz', 'S10')])
Here we have created a one-dimensional array of length 2. Each element of this array is a structure that contains three items, a 32-bit integer, a 32-bit float, and a string of length 10 or less.

Without passing in dtype, however, we will get a 2 by 3 matrix.

With this method, we would be able to let numpy treat a higher dimensional array as an single element with properly set dtype.

Another trick Joe showed is that we don't need to really form a new numpy array to achieve the purpose. We can use the view function (See ndarray.view) to change the way numpy view data. There is a section of Note section in ndarray.view that I think you should take a look before utilizing the method. I have no guarantee that there would not be side effects. The paragraph below is from the note section and seems to call for caution.

For a.view(some_dtype), if some_dtype has a different number of bytes per entry than the previous dtype (for example, converting a regular array to a structured array), then the behavior of the view cannot be predicted just from the superficial appearance of a (shown by print(a)). It also depends on exactly how a is stored in memory. Therefore if a is C-ordered versus fortran-ordered, versus defined as a slice or transpose, etc., the view may give different results.

Other reference

https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.dtypes.html https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.dtype.html

来源：https://stackoverflow.com/questions/48178213/how-to-make-numpy-treat-each-row-tensor-as-a-value

标签

python

arrays

numpy

n-dimensional