Tensorflow per channel quantization

问题

Using the current Tensorflow quantization ops, how would I go about simulating per-channel quantization during inference? This paper defines per-layer quantization as

We can specify a single quantizer (defined by the scale and zero-point) for an entire tensor referred to as per-layer quantization

and per-channel quantization as

Per-channel quantization has a different scale and offset for each convolutional kernel.

Let's assume we have this subgraph

import tensorflow as tf

x = np.random.uniform(size=500*80*64*1)
      .astype('float32')
      .reshape(500, 80, 64, 1)
W1 = tf.get_variable('W1', 9, 5, 1, 96],
                    initializer=tf.truncated_normal_initializer(stddev=0.1))
h1 = tf.nn.conv2d(x, w, strides=[1, 1, 1, 1], padding='VALID')

Under the current APIs present, I would probably do something like this to simulate per-layer quantization at the inference time

import tensorflow as tf

x = np.random.uniform(size=500*80*64*1)
      .astype('float32')
      .reshape(500, 80, 64, 1)
min_x = tf.reduce_min(x)
max_x = tf.reduce_max(x)

W1 = tf.get_variable('W1', 9, 5, 1, 96],
                    initializer=tf.truncated_normal_initializer(stddev=0.1))
min_W1 = tf.reduce_min(W1)
max_W1 = tf.reduce_max(W1)

qX = tf.quantize(A, min_X, max_X, tf.quint8, mode='MIN_FIRST')
qW = tf.quantize(W, min_W, max_W, tf.quint8, mode='MIN_FIRST')

# This is how one would simulate per layer quantization for convolution.
qAW = tf.nn.quantized_conv2d(qX[0], qW[0], qX[1], qX[2], qW[1], qW[2], 
strides = [1, 1, 1, 1], padding='VALID')

My question is how would I simulate per channel quantization? As I understand the tf.quantization.quantize is actually doing per-layer quantization and not per-channel quantization. Also, the tf.nn.quantized_conv2d is actually doing quantized-layer-input to quantized-layer-kernels convolutions.

As per my understanding of per-channel quantization, there would be k, output_min and output_max. Where k is 96 in my example (the number of kernels, analogous to this API ).

Are there any existing Ops in tensorflow that can handle per-channel quantization or is there a way of making it work with existing ops?

回答1:

At this moment there is no way to simulate per channel quantization inference on tflite. As i see, tensorflow developers nowadays are implementing experimental symmetric per channel quantization. But there is no ways to test it

来源：https://stackoverflow.com/questions/54166589/tensorflow-per-channel-quantization

标签

python

tensorflow

tensorflow-lite