Knowledge Isomorphism between Neural Networks论文笔记

Published in ArXiv 2019 原文地址
本片文章是利用knowledge isomorphism （知识同构性）去解释模型压缩和知识蒸馏。个人感觉，有值得肯定的地方。

Motivation

本文假设有两个神经网络A和B， $x_{A}$ 和 $x_{B}$ 表示它们在同一个task上的输出的特征。我们将其分解（disentangle）为 $x_{A} = \hat{x}_{A} + \epsilon_{A}$ 和 $x_{B} = \hat{x}_{B} + \epsilon_{B}$ 。其中， $(\hat{x}_{A},\hat{x}_{B})$ 表示为 $x_{A}$ 和 $x_{B}$ 的相关同构知识（corresponding isomorphic knowledge）， $(\epsilon_{A},\epsilon_{B})$ 表示 $x_{A}$ 和 $x_{B}$ 的不同构的特征部分（non-isomorphic feature components）。
简单来说，一个是表示的是共同的feature，另一个表示的是不同的，也就是各自特有的feature。（个人理解）。作者还定义说， $\hat{x}_{A}$ 和 $\hat{x}_{B}$ 之间是可以相互reconstruct的。

Architecture

既然定义了 $\hat{x}_{A}$ 和 $\hat{x}_{B}$ 之间是可以相互reconstruct的，所以作者就弄了一个reconstruction model $g_{k}()$ ，定义 $\hat{x}_{A} = g_{k}(x_{B})$ 。其中 $k$ 表示的是k-order isomorphic feature，we can take it as fuzziness level。我们可以从下面这种图中了解到k-order所带来的影响（确实是k越高越模糊。。。）：
在这里插入图片描述
至于这个 $k$ 和 $g()$ 有怎样的关系呢，原话More crucially, we implement the network-agnostic model $g$ as a neural network, where $k$ is set as the number of non-linear layers in $g$ ，一目了然。

文章使用的解释模型可以分别用公式和图片表示为：
在这里插入图片描述
（公式里的 $\sum^{-\frac{1}{2}}_{(k+1)} h^{k+1}$ 即是图中的Norm模快，是element-wise方差的对角矩阵。 $W$ 表示一个Linear或者CNN层）

如图所示，the entire network can be separated into $(K + 1)$ branches，k-th branch含有n个non-linear layer。

来源：CSDN

作者：ChrisXue228

链接：https://blog.csdn.net/weixin_40423134/article/details/103914209

标签

edg战队