Confused about keras Dot Layer. How is the Dot product computed?

问题

I read all posts about the Dot Layer but none explains how this and so the output shape is computed! It seems so standard though! How exactly are the values computed with a along a specific axis?

val = np.random.randint(2, size=(2, 3, 4))
a = K.variable(value=val)
val2 = np.random.randint(2, size=(2, 2, 3))
b = K.variable(value=val)
print("a")
print(val)
print("b")
print(val2)
out = Dot(axes = 2)([a,b])
print(out.shape)
print("DOT")
print(K.eval(out))

I get:

a
[[[0 1 1 1]
  [1 1 0 0]
  [0 0 1 1]]

 [[1 1 1 0]
  [0 0 1 0]
  [0 1 0 0]]]
b
[[[1 0 1]
  [1 0 1]]

 [[1 0 1]
  [1 1 0]]]
(2, 3, 3)
DOT
[[[ 3.  1.  2.]
  [ 1.  2.  0.]
  [ 2.  0.  2.]]

 [[ 3.  1.  1.]
  [ 1.  1.  0.]
  [ 1.  0.  1.]]]

I cannot understand with my mathematical and algebraic matrix know-how how the heck this is computed?

回答1:

Here's how the Dot product works. Internally it is calling K.batch_dot.

First, I think you might have intended to do,

val = np.random.randint(2, size=(2, 3, 4))
a = K.variable(value=val)
val2 = np.random.randint(2, size=(2, 2, 3))
b = K.variable(value=val2) # You have val here

But fortunately, you had (or could have been your initial intention too. Anyway just pointing out)

b = K.variable(value=val)

If you had the intended code, it will throw an error because the dimension you want the dot product on, doesn't match. Moving on,

How dot product is computed

You have

a.shape = (2,3,4)
b.shape = (2,3,4)

First you are only performing element-wise dot over the batch dimension. So that dimension stays that way.

Now you can ignore the first dimension of both a and b and consider the dot product between two matrices (3,4) and (3,4) and do the dot product over the last axis, which results in a (3,3) matrix. Now add the batch dimension you get a,

(2, 3, 3) tensor

Let's now take your example. You got,

a
[[[0 1 1 1]
  [1 1 0 0]
  [0 0 1 1]]

 [[1 1 1 0]
  [0 0 1 0]
  [0 1 0 0]]]

b
[[[0 1 1 1]
  [1 1 0 0]
  [0 0 1 1]]

 [[1 1 1 0]
  [0 0 1 0]
  [0 1 0 0]]]

Then you do the following two dot products.

# 1st sample
[0 1 1 1] . [0 1 1 1]
[1 1 0 0] . [1 1 0 0]
[0 0 1 1] . [0 0 1 1]

# 2nd sample
[1 1 1 0] . [1 1 1 0]
[0 0 1 0] . [0 0 1 0]
[0 1 0 0] . [0 1 0 0]

This gives,

# 1st sample
[3 1 2]
[1 2 0]
[2 0 2]

# 2nd sample
[ 3 1 1]
[ 1 1 0]
[ 1 0 1]

Finally by adding the missing batch dimension you get,

[[[ 3.  1.  2.]
  [ 1.  2.  0.]
  [ 2.  0.  2.]]

 [[ 3.  1.  1.]
  [ 1.  1.  0.]
  [ 1.  0.  1.]]]

来源：https://stackoverflow.com/questions/59502733/confused-about-keras-dot-layer-how-is-the-dot-product-computed

标签

keras

keras-layer