Evaluating Tensorflow operation is very slow in a loop

问题

I'm trying to learn tensorflow by coding up some simple problems: I was trying to find the value of pi using a direct sampling Monte Carlo method.

The run time is much longer than I thought it would be when using a for loop to do this. I've seen other posts about similar things and I've tried to follow the solutions, but I think I still must be doing something wrong.

Attached below is my code:

import tensorflow as tf
import numpy as np
import time

n_trials = 50000

tf.reset_default_graph()


x = tf.random_uniform(shape=(), name='x')
y = tf.random_uniform(shape=(), name='y')
r = tf.sqrt(x**2 + y**2)

hit = tf.Variable(0, name='hit')

# perform the monte carlo step
is_inside = tf.cast(tf.less(r, 1), tf.int32)
hit_op = hit.assign_add(is_inside) 

with tf.Session() as sess:
    init_op = tf.global_variables_initializer()
    sess.run(init_op)

    # Make sure no new nodes are added to the graph
    sess.graph.finalize()

    start = time.time()   

    # Run monte carlo trials  -- This is very slow
    for _ in range(n_trials):
        sess.run(hit_op)

    hits = hit.eval()
    print("Pi is {}".format(4*hits/n_trials))
    print("Tensorflow operation took {:.2f} s".format((time.time()-start)))

>>> Pi is 3.15208
>>> Tensorflow operation took 8.98 s

In comparison, doing a for loop type solution in numpy is an order of magnitude faster

start = time.time()   
hits = [ 1 if np.sqrt(np.sum(np.square(np.random.uniform(size=2)))) < 1 else 0 for _ in range(n_trials) ]
a = 0
for hit in hits:
    a+=hit
print("numpy operation took {:.2f} s".format((time.time()-start)))
print("Pi is {}".format(4*a/n_trials))

>>> Pi is 3.14032
>>> numpy operation took 0.75 s

Attached below is a plot of the difference in overall executioin times for various numbers of trials.

Please note: my question is not about "how to perform this task the fastest", I recognize there are much more effective ways of calculating Pi. I've only used this as a benchmarking tool to check the performance of tensorflow against something I'm familiar with (numpy).

回答1:

The slow in speed has got to do with some communication overhead between Python and Tensorflow in sess.run, which is executed multiple times inside your loop. I would suggest using tf.while_loop to execute the computations within Tensorflow. That would be a better comparison over numpy.

import tensorflow as tf
import numpy as np
import time

n_trials = 50000

tf.reset_default_graph()

hit = tf.Variable(0, name='hit')

def body(ctr):
    x = tf.random_uniform(shape=[2], name='x')
    r = tf.sqrt(tf.reduce_sum(tf.square(x))
    is_inside = tf.cond(tf.less(r,1), lambda: tf.constant(1), lambda: tf.constant(0))
    hit_op = hit.assign_add(is_inside)
    with tf.control_dependencies([hit_op]):
        return ctr + 1

def condition(ctr):
    return ctr < n_trials

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    result = tf.while_loop(condition, body, [tf.constant(0)])

    start = time.time()
    sess.run(result)

    hits = hit.eval()
    print("Pi is {}".format(4.*hits/n_trials))
    print("Tensorflow operation took {:.2f} s".format((time.time()-start)))

回答2:

Simple, session.run has much overhead, and it is not designed to be used that way. Normally, having e.g. a neural net you would call a single session.run for a dozen of multiplications of big matrices, then this 0.2 ms it takes would not matter at all. As for your case, you wanted something like that probably. It runs 5 times faster than numpy version on my machine.

By the way, you do exactly same thing in numpy. If you used loop to reduce instead of np.sum it would be much slower.

    import tensorflow as tf
    import numpy as np
    import time

    n_trials = 50000

    tf.reset_default_graph()

    x = tf.random_uniform(shape=(n_trials,), name='x')
    y = tf.random_uniform(shape=(), name='y')
    r = tf.sqrt(x**2 + y**2)

    hit = tf.Variable(0, name='hit')

    # perform the monte carlo step
    is_inside = tf.cast(tf.less(r, 1), tf.int32)
    hit2= tf.reduce_sum(is_inside)
        #hit_op = hit.assign_add(is_inside) 

    with tf.Session() as sess:
    #    init_op = tf.global_variables_initializer()
        sess.run(tf.initialize_all_variables())

        # Make sure no new nodes are added to the graph
        sess.graph.finalize()

        start = time.time()   

        # Run monte carlo trials  -- This is very slow
        #for _ in range(n_trials):
        sess.run(hit2)

        hits = hit2.eval()
        print("Pi is {}".format(4*hits/n_trials))
        print("Tensorflow operation took {:.2f} s".format((time.time()-start)))

来源：https://stackoverflow.com/questions/42860617/evaluating-tensorflow-operation-is-very-slow-in-a-loop

标签

python

tensorflow

montecarlo