Difference between numpy.array shape (R, 1) and (R,)

后端 未结 6 1559
甜味超标
甜味超标 2020-11-22 04:25

In numpy, some of the operations return in shape (R, 1) but some return (R,). This will make matrix multiplication more tedious since

相关标签:
6条回答
  • 2020-11-22 04:26

    The shape is a tuple. If there is only 1 dimension the shape will be one number and just blank after a comma. For 2+ dimensions, there will be a number after all the commas.

    # 1 dimension with 2 elements, shape = (2,). 
    # Note there's nothing after the comma.
    z=np.array([  # start dimension
        10,       # not a dimension
        20        # not a dimension
    ])            # end dimension
    print(z.shape)
    

    (2,)

    # 2 dimensions, each with 1 element, shape = (2,1)
    w=np.array([  # start outer dimension 
        [10],     # element is in an inner dimension
        [20]      # element is in an inner dimension
    ])            # end outer dimension
    print(w.shape)
    

    (2,1)

    0 讨论(0)
  • 2020-11-22 04:31

    For its base array class, 2d arrays are no more special than 1d or 3d ones. There are some operations the preserve the dimensions, some that reduce them, other combine or even expand them.

    M=np.arange(9).reshape(3,3)
    M[:,0].shape # (3,) selects one column, returns a 1d array
    M[0,:].shape # same, one row, 1d array
    M[:,[0]].shape # (3,1), index with a list (or array), returns 2d
    M[:,[0,1]].shape # (3,2)
    
    In [20]: np.dot(M[:,0].reshape(3,1),np.ones((1,3)))
    
    Out[20]: 
    array([[ 0.,  0.,  0.],
           [ 3.,  3.,  3.],
           [ 6.,  6.,  6.]])
    
    In [21]: np.dot(M[:,[0]],np.ones((1,3)))
    Out[21]: 
    array([[ 0.,  0.,  0.],
           [ 3.,  3.,  3.],
           [ 6.,  6.,  6.]])
    

    Other expressions that give the same array

    np.dot(M[:,0][:,np.newaxis],np.ones((1,3)))
    np.dot(np.atleast_2d(M[:,0]).T,np.ones((1,3)))
    np.einsum('i,j',M[:,0],np.ones((3)))
    M1=M[:,0]; R=np.ones((3)); np.dot(M1[:,None], R[None,:])
    

    MATLAB started out with just 2D arrays. Newer versions allow more dimensions, but retain the lower bound of 2. But you still have to pay attention to the difference between a row matrix and column one, one with shape (1,3) v (3,1). How often have you written [1,2,3].'? I was going to write row vector and column vector, but with that 2d constraint, there aren't any vectors in MATLAB - at least not in the mathematical sense of vector as being 1d.

    Have you looked at np.atleast_2d (also _1d and _3d versions)?

    0 讨论(0)
  • 2020-11-22 04:38

    1. The meaning of shapes in NumPy

    You write, "I know literally it's list of numbers and list of lists where all list contains only a number" but that's a bit of an unhelpful way to think about it.

    The best way to think about NumPy arrays is that they consist of two parts, a data buffer which is just a block of raw elements, and a view which describes how to interpret the data buffer.

    For example, if we create an array of 12 integers:

    >>> a = numpy.arange(12)
    >>> a
    array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
    

    Then a consists of a data buffer, arranged something like this:

    ┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
    │  0 │  1 │  2 │  3 │  4 │  5 │  6 │  7 │  8 │  9 │ 10 │ 11 │
    └────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘
    

    and a view which describes how to interpret the data:

    >>> a.flags
      C_CONTIGUOUS : True
      F_CONTIGUOUS : True
      OWNDATA : True
      WRITEABLE : True
      ALIGNED : True
      UPDATEIFCOPY : False
    >>> a.dtype
    dtype('int64')
    >>> a.itemsize
    8
    >>> a.strides
    (8,)
    >>> a.shape
    (12,)
    

    Here the shape (12,) means the array is indexed by a single index which runs from 0 to 11. Conceptually, if we label this single index i, the array a looks like this:

    i= 0    1    2    3    4    5    6    7    8    9   10   11
    ┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
    │  0 │  1 │  2 │  3 │  4 │  5 │  6 │  7 │  8 │  9 │ 10 │ 11 │
    └────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘
    

    If we reshape an array, this doesn't change the data buffer. Instead, it creates a new view that describes a different way to interpret the data. So after:

    >>> b = a.reshape((3, 4))
    

    the array b has the same data buffer as a, but now it is indexed by two indices which run from 0 to 2 and 0 to 3 respectively. If we label the two indices i and j, the array b looks like this:

    i= 0    0    0    0    1    1    1    1    2    2    2    2
    j= 0    1    2    3    0    1    2    3    0    1    2    3
    ┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
    │  0 │  1 │  2 │  3 │  4 │  5 │  6 │  7 │  8 │  9 │ 10 │ 11 │
    └────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘
    

    which means that:

    >>> b[2,1]
    9
    

    You can see that the second index changes quickly and the first index changes slowly. If you prefer this to be the other way round, you can specify the order parameter:

    >>> c = a.reshape((3, 4), order='F')
    

    which results in an array indexed like this:

    i= 0    1    2    0    1    2    0    1    2    0    1    2
    j= 0    0    0    1    1    1    2    2    2    3    3    3
    ┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
    │  0 │  1 │  2 │  3 │  4 │  5 │  6 │  7 │  8 │  9 │ 10 │ 11 │
    └────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘
    

    which means that:

    >>> c[2,1]
    5
    

    It should now be clear what it means for an array to have a shape with one or more dimensions of size 1. After:

    >>> d = a.reshape((12, 1))
    

    the array d is indexed by two indices, the first of which runs from 0 to 11, and the second index is always 0:

    i= 0    1    2    3    4    5    6    7    8    9   10   11
    j= 0    0    0    0    0    0    0    0    0    0    0    0
    ┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
    │  0 │  1 │  2 │  3 │  4 │  5 │  6 │  7 │  8 │  9 │ 10 │ 11 │
    └────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘
    

    and so:

    >>> d[10,0]
    10
    

    A dimension of length 1 is "free" (in some sense), so there's nothing stopping you from going to town:

    >>> e = a.reshape((1, 2, 1, 6, 1))
    

    giving an array indexed like this:

    i= 0    0    0    0    0    0    0    0    0    0    0    0
    j= 0    0    0    0    0    0    1    1    1    1    1    1
    k= 0    0    0    0    0    0    0    0    0    0    0    0
    l= 0    1    2    3    4    5    0    1    2    3    4    5
    m= 0    0    0    0    0    0    0    0    0    0    0    0
    ┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
    │  0 │  1 │  2 │  3 │  4 │  5 │  6 │  7 │  8 │  9 │ 10 │ 11 │
    └────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘
    

    and so:

    >>> e[0,1,0,0,0]
    6
    

    See the NumPy internals documentation for more details about how arrays are implemented.

    2. What to do?

    Since numpy.reshape just creates a new view, you shouldn't be scared about using it whenever necessary. It's the right tool to use when you want to index an array in a different way.

    However, in a long computation it's usually possible to arrange to construct arrays with the "right" shape in the first place, and so minimize the number of reshapes and transposes. But without seeing the actual context that led to the need for a reshape, it's hard to say what should be changed.

    The example in your question is:

    numpy.dot(M[:,0], numpy.ones((1, R)))
    

    but this is not realistic. First, this expression:

    M[:,0].sum()
    

    computes the result more simply. Second, is there really something special about column 0? Perhaps what you actually need is:

    M.sum(axis=0)
    
    0 讨论(0)
  • 2020-11-22 04:42

    1) The reason not to prefer a shape of (R, 1) over (R,) is that it unnecessarily complicates things. Besides, why would it be preferable to have shape (R, 1) by default for a length-R vector instead of (1, R)? It's better to keep it simple and be explicit when you require additional dimensions.

    2) For your example, you are computing an outer product so you can do this without a reshape call by using np.outer:

    np.outer(M[:,0], numpy.ones((1, R)))
    
    0 讨论(0)
  • 2020-11-22 04:48

    The difference between (R,) and (1,R) is literally the number of indices that you need to use. ones((1,R)) is a 2-D array that happens to have only one row. ones(R) is a vector. Generally if it doesn't make sense for the variable to have more than one row/column, you should be using a vector, not a matrix with a singleton dimension.

    For your specific case, there are a couple of options:

    1) Just make the second argument a vector. The following works fine:

        np.dot(M[:,0], np.ones(R))
    

    2) If you want matlab like matrix operations, use the class matrix instead of ndarray. All matricies are forced into being 2-D arrays, and operator * does matrix multiplication instead of element-wise (so you don't need dot). In my experience, this is more trouble that it is worth, but it may be nice if you are used to matlab.

    0 讨论(0)
  • 2020-11-22 04:52

    There are a lot of good answers here already. But for me it was hard to find some example, where the shape or array can break all the program.

    So here is the one:

    import numpy as np
    a = np.array([1,2,3,4])
    b = np.array([10,20,30,40])
    
    
    from sklearn.linear_model import LinearRegression
    regr = LinearRegression()
    regr.fit(a,b)
    

    This will fail with error:

    ValueError: Expected 2D array, got 1D array instead

    but if we add reshape to a:

    a = np.array([1,2,3,4]).reshape(-1,1)
    

    this works correctly!

    0 讨论(0)
提交回复
热议问题