tensorflow-xla

indexing in tensorflow slower than gather

房东的猫 提交于 2020-01-24 11:05:26
问题 I am trying to index into a tensor to get a slice or single element from 1d tensors. I find that there is significant performance difference when using the numpy way of indexing [:] and slice vs tf.gather (almost 30-40% ). Also I observe that tf.gather has significant overhead when used on scalars (looping over unstacked tensor) as opposed to tensor . Is this a known issue ? example code (inefficient) : for node_idxs in graph.nodes(): node_indice_list = tf.unstack(node_idxs) result = [] for

Why is TensorFlow XLA in experimental status

筅森魡賤 提交于 2020-01-03 05:04:25
问题 I'm interested in using XLA for the training with the custom Device (FPGA, ...). However, I learned that XLA is now in experimental status from developer's tutorial. https://www.tensorflow.org/performance/xla/ I did not get the reason why XLA is in experimental status. Is there any big issue about XLA except for the performance improvement? Thanks 回答1: XLA is still very new : it was released on March 2017. As stated of the Tensorflow XLA page : Note: XLA is experimental and considered alpha.

First tf.session.run() performs dramatically different from later runs. Why?

让人想犯罪 __ 提交于 2019-12-20 12:39:10
问题 Here's an example to clarify what I mean: First session.run(): First run of a TensorFlow session Later session.run(): Later runs of a TensorFlow session I understand TensorFlow is doing some initialization here, but I'd like to know where in the source this manifests. This occurs on CPU as well as GPU, but the effect is more prominent on GPU. For example, in the case of a explicit Conv2D operation, the first run has a much larger quantity of Conv2D operations in the GPU stream. In fact, if I

How can I activate Tensorflow's XLA for the C API?

只愿长相守 提交于 2019-12-11 15:39:02
问题 I have built Tensorflow from source and I am using it's C API. So far everything works good, I am also using AVX / AVX2. My Tensorflow build from source was also built with XLA support. I now would like to also activate XLA (accelerated linear algebra) as I hope that it will once again increase the performance / speed during inference. If I start my run right now I get this message: 2019-06-17 16:09:06.753737: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1541] (One-time warning):

Tensorflow - XLA | Passing tensors to external functions at runtime

有些话、适合烂在心里 提交于 2019-12-11 05:55:28
问题 I'm testing a library that offloads certain sensitive computations into a secure environment. Tensorflow is one application that I and my team are interested to port to work with this, especially to work with XLA. My team wasn't successful with adding a TF op that does the offload. For this to work with XLA, I need to insert XLA ops that send and receive data to this library through an API provided by the library. My understanding is that these XLA ops have to be added to the translation of

Tensorflow: device CUDA:0 not supported by XLA service while setting up XLA_GPU_JIT device number 0

六月ゝ 毕业季﹏ 提交于 2019-12-08 18:03:23
问题 I got this when using keras with Tensorflow backend: tensorflow.python.framework.errors_impl.InvalidArgumentError: device CUDA:0 not supported by XLA service while setting up XLA_GPU_JIT device number 0 Relevant code: tfconfig = tf.ConfigProto() tfconfig.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1 tfconfig.gpu_options.allow_growth = True K.tensorflow_backend.set_session(tf.Session(config=tfconfig)) tensorflow version: 1.14.0 回答1: This could be due to your TF

tensorflow XLA not producing the dot file

时光怂恿深爱的人放手 提交于 2019-12-08 04:00:48
问题 I am trying to follow the tutorial on XLA and JIT (https://www.tensorflow.org/performance/xla/jit). According to https://www.tensorflow.org/performance/xla/jit#step_3_run_with_xla, when I run the command https://www.tensorflow.org/performance/xla/jit#step_3_run_with_xla It should produce an output with the location to the XLA graph. However, my output does not include this info. Extracting /tmp/tensorflow/mnist/input_data/train-images-idx3-ubyte.gz Extracting /tmp/tensorflow/mnist/input_data

tensorflow XLA not producing the dot file

感情迁移 提交于 2019-12-06 15:51:31
I am trying to follow the tutorial on XLA and JIT ( https://www.tensorflow.org/performance/xla/jit ). According to https://www.tensorflow.org/performance/xla/jit#step_3_run_with_xla , when I run the command https://www.tensorflow.org/performance/xla/jit#step_3_run_with_xla It should produce an output with the location to the XLA graph. However, my output does not include this info. Extracting /tmp/tensorflow/mnist/input_data/train-images-idx3-ubyte.gz Extracting /tmp/tensorflow/mnist/input_data/train-labels-idx1-ubyte.gz Extracting /tmp/tensorflow/mnist/input_data/t10k-images-idx3-ubyte.gz

First tf.session.run() performs dramatically different from later runs. Why?

北战南征 提交于 2019-12-03 07:44:15
Here's an example to clarify what I mean: First session.run(): First run of a TensorFlow session Later session.run(): Later runs of a TensorFlow session I understand TensorFlow is doing some initialization here, but I'd like to know where in the source this manifests. This occurs on CPU as well as GPU, but the effect is more prominent on GPU. For example, in the case of a explicit Conv2D operation, the first run has a much larger quantity of Conv2D operations in the GPU stream. In fact, if I change the input size of the Conv2D, it can go from tens to hundreds of stream Conv2D operations. In