Difference between np.dot and np.multiply with np.sum in binary cross-entropy loss calculation

前端 未结 4 561
盖世英雄少女心
盖世英雄少女心 2021-01-30 22:09

I have tried the following code but didn\'t find the difference between np.dot and np.multiply with np.sum

Here is np.dot

相关标签:
4条回答
  • 2021-01-30 22:39

    If Y and A2 are (1,N) arrays, then np.dot(Y,A.T) will produce a (1,1) result. It is doing a matrix multiplication of a (1,N) with a (N,1). The N's are summed, leaving the (1,1).

    With multiply the result is (1,N). Sum all values, and the result is a scalar.

    If Y and A2 were (N,) shaped (same number of elements, but 1d), the np.dot(Y,A2) (no .T) would also produce a scalar. From np.dot documentation:

    For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors

    Returns the dot product of a and b. If a and b are both scalars or both 1-D arrays then a scalar is returned; otherwise an array is returned.

    squeeze reduces all size 1 dimensions, but still returns an array. In numpy an array can have any number of dimensions (from 0 to 32). So a 0d array is possible. Compare the shape of np.array(3), np.array([3]) and np.array([[3]]).

    0 讨论(0)
  • 2021-01-30 22:42

    np.dot is the dot product of two matrices.

    |A B| . |E F| = |A*E+B*G A*F+B*H|
    |C D|   |G H|   |C*E+D*G C*F+D*H|
    

    Whereas np.multiply does an element-wise multiplication of two matrices.

    |A B| ⊙ |E F| = |A*E B*F|
    |C D|   |G H|   |C*G D*H|
    

    When used with np.sum, the result being equal is merely a coincidence.

    >>> np.dot([[1,2], [3,4]], [[1,2], [2,3]])
    array([[ 5,  8],
           [11, 18]])
    >>> np.multiply([[1,2], [3,4]], [[1,2], [2,3]])
    array([[ 1,  4],
           [ 6, 12]])
    
    >>> np.sum(np.dot([[1,2], [3,4]], [[1,2], [2,3]]))
    42
    >>> np.sum(np.multiply([[1,2], [3,4]], [[1,2], [2,3]]))
    23
    
    0 讨论(0)
  • 2021-01-30 22:51

    What you're doing is calculating the binary cross-entropy loss which measures how bad the predictions (here: A2) of the model are when compared to the true outputs (here: Y).

    Here is a reproducible example for your case, which should explain why you get a scalar in the second case using np.sum

    In [88]: Y = np.array([[1, 0, 1, 1, 0, 1, 0, 0]])
    
    In [89]: A2 = np.array([[0.8, 0.2, 0.95, 0.92, 0.01, 0.93, 0.1, 0.02]])
    
    In [90]: logprobs = np.dot(Y, (np.log(A2)).T) + np.dot((1.0-Y),(np.log(1 - A2)).T)
    
    # `np.dot` returns 2D array since its arguments are 2D arrays
    In [91]: logprobs
    Out[91]: array([[-0.78914626]])
    
    In [92]: cost = (-1/m) * logprobs
    
    In [93]: cost
    Out[93]: array([[ 0.09864328]])
    
    In [94]: logprobs = np.sum(np.multiply(np.log(A2), Y) + np.multiply((1 - Y), np.log(1 - A2)))
    
    # np.sum returns scalar since it sums everything in the 2D array
    In [95]: logprobs
    Out[95]: -0.78914625761870361
    

    Note that the np.dot sums along only the inner dimensions which match here (1x8) and (8x1). So, the 8s will be gone during the dot product or matrix multiplication yielding the result as (1x1) which is just a scalar but returned as 2D array of shape (1,1).


    Also, most importantly note that here np.dot is exactly same as doing np.matmul since the inputs are 2D arrays (i.e. matrices)

    In [107]: logprobs = np.matmul(Y, (np.log(A2)).T) + np.matmul((1.0-Y),(np.log(1 - A2)).T)
    
    In [108]: logprobs
    Out[108]: array([[-0.78914626]])
    
    In [109]: logprobs.shape
    Out[109]: (1, 1)
    

    Return result as a scalar value

    np.dot or np.matmul returns whatever the resulting array shape would be, based on input arrays. Even with out= argument it's not possible to return a scalar, if the inputs are 2D arrays. However, we can use np.asscalar() on the result to convert it to a scalar if the result array is of shape (1,1) (or more generally a scalar value wrapped in an nD array)

    In [123]: np.asscalar(logprobs)
    Out[123]: -0.7891462576187036
    
    In [124]: type(np.asscalar(logprobs))
    Out[124]: float
    

    ndarray of size 1 to scalar value

    In [127]: np.asscalar(np.array([[[23.2]]]))
    Out[127]: 23.2
    
    In [128]: np.asscalar(np.array([[[[23.2]]]]))
    Out[128]: 23.2
    
    0 讨论(0)
  • 2021-01-30 23:00
    In this example it just not a coincidence. Lets take an example we have two (1,3) and (1,3) matrices. 
    // Lets code 
    
    import numpy as np
    
    x1=np.array([1, 2, 3]) // first array
    x2=np.array([3, 4, 3]) // second array
    
    //Then 
    
    X_Res=np.sum(np.multiply(x1,x2)) 
    // will result 20 as it will be calculated as - (1*3)+(2*4)+(3*3) , i.e element wise
    // multiplication followed by sum.
    
    Y_Res=np.dot(x1,x2.T) 
    
    // in order to get (1,1) matrix) from a dot of (1,3) matrix and //(1,3) matrix we need to //transpose second one. 
    //Hence|1 2 3| * |3|
    //               |4| = |1*3+2*4+3*3| = |20|
    //               |3|
    // will result 20 as it will be (1*3)+(2*4)+(3*3) , i.e. dot product of two matrices
    
    print X_Res //20
    
    print Y_Res //20
    
    0 讨论(0)
提交回复
热议问题