Tensorflow: simultaneous prediction on GPU and CPU

后端 未结 1 1867
隐瞒了意图╮
隐瞒了意图╮ 2021-02-04 14:38

I’m working with tensorflow and I want to speed up the prediction phase of a pre-trained Keras model (I\'m not interested in the training phase) by using

相关标签:
1条回答
  • 2021-02-04 15:13

    Here's my code that demonstrates how CPU and GPU execution can be done in parallel:

    import tensorflow as tf
    import numpy as np
    from time import time
    from threading import Thread
    
    n = 1024 * 8
    
    data_cpu = np.random.uniform(size=[n//16, n]).astype(np.float32)
    data_gpu = np.random.uniform(size=[n    , n]).astype(np.float32)
    
    with tf.device('/cpu:0'):
        x = tf.placeholder(name='x', dtype=tf.float32)
    
    def get_var(name):
        return tf.get_variable(name, shape=[n, n])
    
    def op(name):
        w = get_var(name)
        y = x
        for _ in range(8):
            y = tf.matmul(y, w)
        return y
    
    with tf.device('/cpu:0'):
        cpu = op('w_cpu')
    
    with tf.device('/gpu:0'):
        gpu = op('w_gpu')
    
    def f(session, y, data):
        return session.run(y, feed_dict={x : data})
    
    
    with tf.Session(config=tf.ConfigProto(log_device_placement=True, intra_op_parallelism_threads=8)) as sess:
        sess.run(tf.global_variables_initializer())
    
        coord = tf.train.Coordinator()
    
        threads = []
    
        # comment out 0 or 1 of the following 2 lines:
        threads += [Thread(target=f, args=(sess, cpu, data_cpu))]
        threads += [Thread(target=f, args=(sess, gpu, data_gpu))]
    
        t0 = time()
    
        for t in threads:
            t.start()
    
        coord.join(threads)
    
        t1 = time()
    
    
    print t1 - t0
    

    The timing results are:

    • CPU thread: 4-5s (will vary by machine, of course).

    • GPU thread: 5s (It does 16x as much work).

    • Both at the same time: 5s

    Note that there was no need to have 2 sessions (but that worked for me too).

    The reasons you might be seeing different results could be

    • some contention for system resources (GPU execution does consume some host system resources, and if running the CPU thread crowds it, that could worsen the performance)

    • incorrect timing

    • part of your model can only run on GPU/CPU

    • bottleneck elsewhere

    • some other problem

    0 讨论(0)
提交回复
热议问题