问题
Using the current Tensorflow quantization ops, how would I go about simulating per-channel quantization during inference? This paper defines per-layer quantization as
We can specify a single quantizer (defined by the scale and zero-point) for an entire tensor referred to as per-layer quantization
and per-channel quantization as
Per-channel quantization has a different scale and offset for each convolutional kernel.
Let's assume we have this subgraph
import tensorflow as tf
x = np.random.uniform(size=500*80*64*1)
.astype('float32')
.reshape(500, 80, 64, 1)
W1 = tf.get_variable('W1', 9, 5, 1, 96],
initializer=tf.truncated_normal_initializer(stddev=0.1))
h1 = tf.nn.conv2d(x, w, strides=[1, 1, 1, 1], padding='VALID')
Under the current APIs present, I would probably do something like this to simulate per-layer quantization at the inference time
import tensorflow as tf
x = np.random.uniform(size=500*80*64*1)
.astype('float32')
.reshape(500, 80, 64, 1)
min_x = tf.reduce_min(x)
max_x = tf.reduce_max(x)
W1 = tf.get_variable('W1', 9, 5, 1, 96],
initializer=tf.truncated_normal_initializer(stddev=0.1))
min_W1 = tf.reduce_min(W1)
max_W1 = tf.reduce_max(W1)
qX = tf.quantize(A, min_X, max_X, tf.quint8, mode='MIN_FIRST')
qW = tf.quantize(W, min_W, max_W, tf.quint8, mode='MIN_FIRST')
# This is how one would simulate per layer quantization for convolution.
qAW = tf.nn.quantized_conv2d(qX[0], qW[0], qX[1], qX[2], qW[1], qW[2],
strides = [1, 1, 1, 1], padding='VALID')
My question is how would I simulate per channel quantization? As I understand the tf.quantization.quantize is actually doing per-layer quantization and not per-channel quantization. Also, the tf.nn.quantized_conv2d
is actually doing quantized-layer-input to quantized-layer-kernels convolutions.
As per my understanding of per-channel quantization, there would be k
, output_min
and output_max
. Where k
is 96
in my example (the number of kernels, analogous to this API ).
Are there any existing Ops in tensorflow that can handle per-channel quantization or is there a way of making it work with existing ops?
回答1:
At this moment there is no way to simulate per channel quantization inference on tflite. As i see, tensorflow developers nowadays are implementing experimental symmetric per channel quantization. But there is no ways to test it
来源:https://stackoverflow.com/questions/54166589/tensorflow-per-channel-quantization