tensorflow-xla | 易学教程

indexing in tensorflow slower than gather

阅读更多关于 indexing in tensorflow slower than gather

问题 I am trying to index into a tensor to get a slice or single element from 1d tensors. I find that there is significant performance difference when using the numpy way of indexing [:] and slice vs tf.gather (almost 30-40% ). Also I observe that tf.gather has significant overhead when used on scalars (looping over unstacked tensor) as opposed to tensor . Is this a known issue ? example code (inefficient) : for node_idxs in graph.nodes(): node_indice_list = tf.unstack(node_idxs) result = [] for

Why is TensorFlow XLA in experimental status

阅读更多关于 Why is TensorFlow XLA in experimental status

问题 I'm interested in using XLA for the training with the custom Device (FPGA, ...). However, I learned that XLA is now in experimental status from developer's tutorial. https://www.tensorflow.org/performance/xla/ I did not get the reason why XLA is in experimental status. Is there any big issue about XLA except for the performance improvement? Thanks 回答1: XLA is still very new : it was released on March 2017. As stated of the Tensorflow XLA page : Note: XLA is experimental and considered alpha.

First tf.session.run() performs dramatically different from later runs. Why?

阅读更多关于 First tf.session.run() performs dramatically different from later runs. Why?

问题 Here's an example to clarify what I mean: First session.run(): First run of a TensorFlow session Later session.run(): Later runs of a TensorFlow session I understand TensorFlow is doing some initialization here, but I'd like to know where in the source this manifests. This occurs on CPU as well as GPU, but the effect is more prominent on GPU. For example, in the case of a explicit Conv2D operation, the first run has a much larger quantity of Conv2D operations in the GPU stream. In fact, if I

How can I activate Tensorflow's XLA for the C API?

阅读更多关于 How can I activate Tensorflow's XLA for the C API?

问题 I have built Tensorflow from source and I am using it's C API. So far everything works good, I am also using AVX / AVX2. My Tensorflow build from source was also built with XLA support. I now would like to also activate XLA (accelerated linear algebra) as I hope that it will once again increase the performance / speed during inference. If I start my run right now I get this message: 2019-06-17 16:09:06.753737: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1541] (One-time warning):

Tensorflow - XLA | Passing tensors to external functions at runtime

阅读更多关于 Tensorflow - XLA | Passing tensors to external functions at runtime

问题 I'm testing a library that offloads certain sensitive computations into a secure environment. Tensorflow is one application that I and my team are interested to port to work with this, especially to work with XLA. My team wasn't successful with adding a TF op that does the offload. For this to work with XLA, I need to insert XLA ops that send and receive data to this library through an API provided by the library. My understanding is that these XLA ops have to be added to the translation of

Tensorflow: device CUDA:0 not supported by XLA service while setting up XLA_GPU_JIT device number 0

阅读更多关于 Tensorflow: device CUDA:0 not supported by XLA service while setting up XLA_GPU_JIT device number 0

问题 I got this when using keras with Tensorflow backend: tensorflow.python.framework.errors_impl.InvalidArgumentError: device CUDA:0 not supported by XLA service while setting up XLA_GPU_JIT device number 0 Relevant code: tfconfig = tf.ConfigProto() tfconfig.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1 tfconfig.gpu_options.allow_growth = True K.tensorflow_backend.set_session(tf.Session(config=tfconfig)) tensorflow version: 1.14.0 回答1: This could be due to your TF

tensorflow XLA not producing the dot file

阅读更多关于 tensorflow XLA not producing the dot file

问题 I am trying to follow the tutorial on XLA and JIT (https://www.tensorflow.org/performance/xla/jit). According to https://www.tensorflow.org/performance/xla/jit#step_3_run_with_xla, when I run the command https://www.tensorflow.org/performance/xla/jit#step_3_run_with_xla It should produce an output with the location to the XLA graph. However, my output does not include this info. Extracting /tmp/tensorflow/mnist/input_data/train-images-idx3-ubyte.gz Extracting /tmp/tensorflow/mnist/input_data

tensorflow XLA not producing the dot file

阅读更多关于 tensorflow XLA not producing the dot file

I am trying to follow the tutorial on XLA and JIT ( https://www.tensorflow.org/performance/xla/jit ). According to https://www.tensorflow.org/performance/xla/jit#step_3_run_with_xla , when I run the command https://www.tensorflow.org/performance/xla/jit#step_3_run_with_xla It should produce an output with the location to the XLA graph. However, my output does not include this info. Extracting /tmp/tensorflow/mnist/input_data/train-images-idx3-ubyte.gz Extracting /tmp/tensorflow/mnist/input_data/train-labels-idx1-ubyte.gz Extracting /tmp/tensorflow/mnist/input_data/t10k-images-idx3-ubyte.gz

First tf.session.run() performs dramatically different from later runs. Why?

阅读更多关于 First tf.session.run() performs dramatically different from later runs. Why?

Here's an example to clarify what I mean: First session.run(): First run of a TensorFlow session Later session.run(): Later runs of a TensorFlow session I understand TensorFlow is doing some initialization here, but I'd like to know where in the source this manifests. This occurs on CPU as well as GPU, but the effect is more prominent on GPU. For example, in the case of a explicit Conv2D operation, the first run has a much larger quantity of Conv2D operations in the GPU stream. In fact, if I change the input size of the Conv2D, it can go from tens to hundreds of stream Conv2D operations. In