问题
I cannot fully understand the behaviour of theano.scan().
Here's an example:
import numpy as np
import theano
import theano.tensor as T
def addf(a1,a2):
return a1+a2
i = T.iscalar('i')
x0 = T.ivector('x0')
step= T.iscalar('step')
results, updates = theano.scan(fn=addf,
outputs_info=[{'initial':x0, 'taps':[-2]}],
non_sequences=step,
n_steps=i)
f=theano.function([x0,i,step],results)
print f([1,1],10,2)
The above snippet prints the following sequence, which is perfectly reasonable:
[ 3 3 5 5 7 7 9 9 11 11]
However if I switch the tap index from -2 to -1, i.e.
outputs_info=[{'initial':x0, 'taps':[-1]}]
The result becomes:
[[ 3 3]
[ 5 5]
[ 7 7]
[ 9 9]
[11 11]
[13 13]
[15 15]
[17 17]
[19 19]
[21 21]]
instead of what would seem reasonable to me (just take the last value of the vector and add 2):
[ 3 5 7 9 11 13 15 17 19 21]
Any help would be much appreciated.
Thanks!
回答1:
When you use taps=[-1], scan suppose that the information in the output info is used as is. That mean the addf function will be called with a vector and the non_sequence as inputs. If you convert x0 to a scalar, it will work as you expect:
import numpy as np
import theano
import theano.tensor as T
def addf(a1,a2):
print a1.type
print a2.type
return a1+a2
i = T.iscalar('i')
x0 = T.iscalar('x0')
step= T.iscalar('step')
results, updates = theano.scan(fn=addf,
outputs_info=[{'initial':x0, 'taps':[-1]}],
non_sequences=step,
n_steps=i)
f=theano.function([x0,i,step],results)
print f(1,10,2)
This give this output:
TensorType(int32, scalar)
TensorType(int32, scalar)
[ 3 5 7 9 11 13 15 17 19 21]
In your case as it do addf(vector,scalar), it broadcast the elemwise value.
Explained in another way, if taps is [-1], x0 will be passed "as is" to the inner function. If taps contain anything else, what is passed to the inner function will have 1 dimension less then x0, as x0 must provide many initial steps value (-2 and -1).
来源:https://stackoverflow.com/questions/26718812/python-theano-scan-function