How to reduce a fully-connected (`“InnerProduct”`) layer using truncated SVD

后端 未结 2 2020
梦毁少年i
梦毁少年i 2020-12-01 19:41

In the paper Girshick, R Fast-RCNN (ICCV 2015), section \"3.1 Truncated SVD for faster detection\", the author proposes to use SVD trick to reduce the size and computation t

相关标签:
2条回答
  • 2020-12-01 20:28

    Actually, Ross Girshick's py-faster-rcnn repo includes an implementation for the SVD step: compress_net.py.

    BTW, you usually need to fine-tune the compressed model to recover the accuracy (or to compress in a more sophisticated way, see for example "Accelerating Very Deep Convolutional Networks for Classification and Detection", Zhang et al).

    Also, for me scipy.linalg.svd worked faster than numpy's svd.

    0 讨论(0)
  • 2020-12-01 20:36

    Some linear-algebra background
    Singular Value Decomposition (SVD) is a decomposition of any matrix W into three matrices:

    W = U S V*
    

    Where U and V are ortho-normal matrices, and S is diagonal with elements in decreasing magnitude on the diagonal. One of the interesting properties of SVD is that it allows to easily approximate W with a lower rank matrix: Suppose you truncate S to have only its k leading elements (instead of all elements on the diagonal) then

    W_app = U S_trunc V*
    

    is a rank k approximation of W.

    Using SVD to approximate a fully connected layer
    Suppose we have a model deploy_full.prototxt with a fully connected layer

    # ... some layers here
    layer {
      name: "fc_orig"
      type: "InnerProduct"
      bottom: "in"
      top: "out"
      inner_product_param {
        num_output: 1000
        # more params...
      }
      # some more...
    }
    # more layers...
    

    Furthermore, we have trained_weights_full.caffemodel - trained parameters for deploy_full.prototxt model.

    1. Copy deploy_full.protoxt to deploy_svd.protoxt and open it in editor of your choice. Replace the fully connected layer with these two layers:

      layer {
        name: "fc_svd_U"
        type: "InnerProduct"
        bottom: "in" # same input
        top: "svd_interim"
        inner_product_param {
          num_output: 20  # approximate with k = 20 rank matrix
          bias_term: false
          # more params...
        }
        # some more...
      }
      # NO activation layer here!
      layer {
        name: "fc_svd_V"
        type: "InnerProduct"
        bottom: "svd_interim"
        top: "out"   # same output
        inner_product_param {
          num_output: 1000  # original number of outputs
          # more params...
        }
        # some more...
      }
      
    2. In python, a little net surgery:

      import caffe
      import numpy as np
      
      orig_net = caffe.Net('deploy_full.prototxt', 'trained_weights_full.caffemodel', caffe.TEST)
      svd_net = caffe.Net('deploy_svd.prototxt', 'trained_weights_full.caffemodel', caffe.TEST)
      # get the original weight matrix
      W = np.array( orig_net.params['fc_orig'][0].data )
      # SVD decomposition
      k = 20 # same as num_ouput of fc_svd_U
      U, s, V = np.linalg.svd(W)
      S = np.zeros((U.shape[0], k), dtype='f4')
      S[:k,:k] = s[:k]  # taking only leading k singular values
      # assign weight to svd net
      svd_net.params['fc_svd_U'][0].data[...] = np.dot(U,S)
      svd_net.params['fc_svd_V'][0].data[...] = V[:k,:]
      svd_net.params['fc_svd_V'][1].data[...] = orig_net.params['fc_orig'][1].data # same bias
      # save the new weights
      svd_net.save('trained_weights_svd.caffemodel')
      

    Now we have deploy_svd.prototxt with trained_weights_svd.caffemodel that approximate the original net with far less multiplications, and weights.

    0 讨论(0)
提交回复
热议问题