问题
I'm trying to learn tensorflow by coding up some simple problems: I was trying to find the value of pi using a direct sampling Monte Carlo method.
The run time is much longer than I thought it would be when using a for loop
to do this. I've seen other posts about similar things and I've tried to follow the solutions, but I think I still must be doing something wrong.
Attached below is my code:
import tensorflow as tf
import numpy as np
import time
n_trials = 50000
tf.reset_default_graph()
x = tf.random_uniform(shape=(), name='x')
y = tf.random_uniform(shape=(), name='y')
r = tf.sqrt(x**2 + y**2)
hit = tf.Variable(0, name='hit')
# perform the monte carlo step
is_inside = tf.cast(tf.less(r, 1), tf.int32)
hit_op = hit.assign_add(is_inside)
with tf.Session() as sess:
init_op = tf.global_variables_initializer()
sess.run(init_op)
# Make sure no new nodes are added to the graph
sess.graph.finalize()
start = time.time()
# Run monte carlo trials -- This is very slow
for _ in range(n_trials):
sess.run(hit_op)
hits = hit.eval()
print("Pi is {}".format(4*hits/n_trials))
print("Tensorflow operation took {:.2f} s".format((time.time()-start)))
>>> Pi is 3.15208
>>> Tensorflow operation took 8.98 s
In comparison, doing a for loop
type solution in numpy is an order of magnitude faster
start = time.time()
hits = [ 1 if np.sqrt(np.sum(np.square(np.random.uniform(size=2)))) < 1 else 0 for _ in range(n_trials) ]
a = 0
for hit in hits:
a+=hit
print("numpy operation took {:.2f} s".format((time.time()-start)))
print("Pi is {}".format(4*a/n_trials))
>>> Pi is 3.14032
>>> numpy operation took 0.75 s
Attached below is a plot of the difference in overall executioin times for various numbers of trials.
Please note: my question is not about "how to perform this task the fastest", I recognize there are much more effective ways of calculating Pi. I've only used this as a benchmarking tool to check the performance of tensorflow against something I'm familiar with (numpy).
回答1:
The slow in speed has got to do with some communication overhead between Python and Tensorflow in sess.run
, which is executed multiple times inside your loop. I would suggest using tf.while_loop
to execute the computations within Tensorflow. That would be a better comparison over numpy
.
import tensorflow as tf
import numpy as np
import time
n_trials = 50000
tf.reset_default_graph()
hit = tf.Variable(0, name='hit')
def body(ctr):
x = tf.random_uniform(shape=[2], name='x')
r = tf.sqrt(tf.reduce_sum(tf.square(x))
is_inside = tf.cond(tf.less(r,1), lambda: tf.constant(1), lambda: tf.constant(0))
hit_op = hit.assign_add(is_inside)
with tf.control_dependencies([hit_op]):
return ctr + 1
def condition(ctr):
return ctr < n_trials
with tf.Session() as sess:
tf.global_variables_initializer().run()
result = tf.while_loop(condition, body, [tf.constant(0)])
start = time.time()
sess.run(result)
hits = hit.eval()
print("Pi is {}".format(4.*hits/n_trials))
print("Tensorflow operation took {:.2f} s".format((time.time()-start)))
回答2:
Simple, session.run has much overhead, and it is not designed to be used that way. Normally, having e.g. a neural net you would call a single session.run for a dozen of multiplications of big matrices, then this 0.2 ms it takes would not matter at all. As for your case, you wanted something like that probably. It runs 5 times faster than numpy version on my machine.
By the way, you do exactly same thing in numpy. If you used loop to reduce instead of np.sum it would be much slower.
import tensorflow as tf
import numpy as np
import time
n_trials = 50000
tf.reset_default_graph()
x = tf.random_uniform(shape=(n_trials,), name='x')
y = tf.random_uniform(shape=(), name='y')
r = tf.sqrt(x**2 + y**2)
hit = tf.Variable(0, name='hit')
# perform the monte carlo step
is_inside = tf.cast(tf.less(r, 1), tf.int32)
hit2= tf.reduce_sum(is_inside)
#hit_op = hit.assign_add(is_inside)
with tf.Session() as sess:
# init_op = tf.global_variables_initializer()
sess.run(tf.initialize_all_variables())
# Make sure no new nodes are added to the graph
sess.graph.finalize()
start = time.time()
# Run monte carlo trials -- This is very slow
#for _ in range(n_trials):
sess.run(hit2)
hits = hit2.eval()
print("Pi is {}".format(4*hits/n_trials))
print("Tensorflow operation took {:.2f} s".format((time.time()-start)))
来源:https://stackoverflow.com/questions/42860617/evaluating-tensorflow-operation-is-very-slow-in-a-loop