问题
I am trying to implement tensorflow's conv1d using numpy operations, ignoring strides and padding for now. I thought I understood it after my previous question but realized today that I was still not getting the right answer when dealing with kernels wider than 1.
So now I am trying to use tflearn as a template because it computes the kernel shape for me. Now that I understand that the convolution can be computed as a matrix multiplication I am attempting to use the kernel matrix accordingly, but I am not getting the same answer as tflearn. Examining the source code is quite opaque because it just calls out to tensorflow's dedicated compiled implementations.
Here's what I've got so far:
inp = np.arange(10).reshape(1,10,1).astype(np.float32)
filters = 2
width = 3
z = tflearn.conv_1d(inp, filters, width, strides=1, padding='same', bias=False)
s = tf.Session()
s.run(tf.global_variables_initializer())
z1, w = s.run([z, z.W])
print('tflearn output shape', z1.shape)
print('tflearn kernel shape', w.shape)
print('numpy matmul shape', (inp @ w).shape)
This indicates that the tflearn kernel puts the width as an extra dimension inserted at the beginning:
tflearn output shape (1, 10, 2)
tflearn kernel shape (3, 1, 1, 2)
numpy matmul shape (3, 1, 10, 2)
Accordingly the result I get has that extra 3
dimension. Fine, so how do I correctly reduce it to get the same answer as tensorflow? I tried summing this dimension, but it is not correct:
print('tflearn output:')
print(z1)
print('numpy output:')
print(np.sum(inp @ w, axis=0))
gives,
tflearn output:
[[[-0.02252221 0.24712706]
[ 0.49539018 1.0828717 ]
[ 0.0315876 2.0945265 ]
[-0.43221498 3.1061814 ]
[-0.89601755 4.117836 ]
[-1.3598201 5.129491 ]
[-1.823623 6.141146 ]
[-2.2874253 7.152801 ]
[-2.7512276 8.164455 ]
[-2.989808 6.7048397 ]]]
numpy output:
[[[ 0. 0. ]
[-0.46380258 1.0116549 ]
[-0.92760515 2.0233097 ]
[-1.3914077 3.0349646 ]
[-1.8552103 4.0466194 ]
[-2.319013 5.0582743 ]
[-2.7828155 6.069929 ]
[-3.2466178 7.0815845 ]
[-3.7104206 8.093239 ]
[-4.174223 9.104893 ]]]
which are clearly different. z.W
has of course been initialized to random values so these numbers are random too, but I am looking for the numpy calculation that would make them equal to z1
, since they are executing the same kernel. Clearly it is not as simple as inp @ w
.
Thanks.
回答1:
Okay sorry, I have answered my own question after some thought... THIS is where the sliding window operation comes in that I was trying to introduce in my previous question:
y = (inp @ w)
y[0,:,:-2,:] + y[1,:,1:-1,:] + y[2,:,2:,:]
gives,
array([[[ 0.49539018, 1.0828717 ],
[ 0.0315876 , 2.0945265 ],
[-0.43221498, 3.1061814 ],
[-0.89601755, 4.117836 ],
[-1.3598201 , 5.129491 ],
[-1.823623 , 6.141146 ],
[-2.2874253 , 7.152801 ],
[-2.7512276 , 8.164455 ]]], dtype=float32)
which is equal to z1
ignoring the first and last rows, exactly what I'd expect from a 3-point convolution.
Edit: but I would be much obliged if someone could propose a more succinct / efficient way to express the sliding window.. I thought from my previous question that even the sliding window could be taken into account in the matrix multiplication, so it's unfortunate to need to write the indexing logic explicitly.
来源:https://stackoverflow.com/questions/59553815/implementing-conv1d-with-numpy-operations