Tensorflow: Where is tf.nn.conv2d Actually Executed?

前端 未结 2 692
感动是毒
感动是毒 2020-12-30 03:34

I am curious about the Tensorflow implementation of tf.nn.conv2d(...). To call it, one simply runs tf.nn.conv2d(...). However, I\'m going down the

相关标签:
2条回答
  • 2020-12-30 04:07

    TL;DR: The implementation of tf.nn.conv2d() is written in C++, which invokes optimized code using either Eigen (on CPU) or the cuDNN library (on GPU). You can find the implementation here.

    The chain of functions that you mentioned in the question (from tf.nn.conv2d() down) are Python functions for building a TensorFlow graph, but these do not invoke the implementation. Recall that, in TensorFlow, you first build a symbolic graph, then execute it.

    The implementation of tf.nn.conv2d() is only executed happens when you call Session.run() passing a Tensor whose value depends on the result of some convolution. For example:

    input = tf.placeholder(tf.float32)
    filter = tf.Variable(tf.truncated_normal([5, 5, 3, 32], stddev=0.1)
    conv = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')
    
    result = sess.run(conv, feed_dict={input: ...})  # <== Execution happens here.
    

    Invoking sess.run(...) tells TensorFlow to run all the ops that are neeeded to compute the value of conv, including the convolution itself. The path from here to the implementation is somewhat complicated, but goes through the following steps:

    1. sess.run() calls the TensorFlow backend to fetch the value of conv.
    2. The backend prunes the computation graph to work out what nodes must be executed, and places the nodes on the appropriate devices (CPU or GPU).
    3. Each device is instructed to execute its subgraph, using an executor.
    4. The executor eventually invokes the tensorflow::OpKernel that corresponds to the convolution operator, by calling its Compute() method.

    The "Conv2D" OpKernel is implemented here, and its Compute() method is here. Because this op is performance critical for many workloads, the implementation is quite complicated, but the basic idea is that the computation is offloaded to either the Eigen Tensor library (if running on CPU), or cuDNN's optimized GPU implementation.

    0 讨论(0)
  • 2020-12-30 04:08

    TensorFlow programs as consisting of two discrete sections:

    • Building the computational graph.

    tf.nn.conv2d(...) -> tf.nn_ops.conv2d(...) -> tf.gen_nn_ops.conv2d(...) -> _op_def_lib.apply_op("Conv2D", ...) -> graph.create_op -> register op into graph

    • Running the computational graph.

    sess = tf.Session(target) -> sess.run(conv2d) -> master prune full graph to client graph -> master split client graph by task to graph partition -> register graph partition to worker -> worker split subgraph by device to graph partition -> then master notify all workers to run graph partitions -> worker notify all devices to run graph partitions -> executor will run ops by topological sort on device.

    For one of op, the executor will invoke kernel implement to compute for the op.

    The kernel implement of tf.nn.conv2d() is written in C++, which invokes optimized code using either Eigen (on CPU) or the cuDNN library (on GPU).

    0 讨论(0)
提交回复
热议问题