caffe: What does the **group** param mean?

后端 未结 3 2032
囚心锁ツ
囚心锁ツ 2021-02-04 14:08

I have read the documentation about the group param:

group (g) [default 1]: If g > 1, we restrict the connectivity of each filter to a su

3条回答
  •  旧巷少年郎
    2021-02-04 14:51

    First of all, Caffe only definite the behave while group is multiple of both input_channel and output_channel. We can confirm this from the source code:

    CHECK_EQ(channels_ % group_, 0);
    CHECK_EQ(num_output_ % group_, 0)
      << "Number of output should be multiples of group.";
    

    Secondly, the parameter group is related to the number of filter paramters, specifically, to the channel size of filter. The actual number of each filter is input_channel/group. This could also be confirmed from the source code:

    vector weight_shape(2);
    weight_shape[0] = conv_out_channels_;
    weight_shape[1] = conv_in_channels_ / group_;
    

    Note here that weight_shape[0] is the number of filer.


    So, w.r.t your question:

    in Caffe, if the input_channel is 40 and the group is 20:

    1. the output_channel may not be 50.
    2. if output_channel is 20 (remember it means you have 20 filters), each 2 input channels take charge of one output channel. For example, the 0th output channel is computed from the 0th and 1th input channels and has no relationship with others input channels.
    3. if output_channel equals to input_channel (i.e.output_channel = 40), this is actually the well-known depthwise convolution. Each output channel is computed from only one different input channel.

    w.r.t Deconvolution:

    We almost always set group = output_channels. Here is the suggested config for Deconvolution layer from the official doc:

    layer {
      name: "upsample", type: "Deconvolution"
      bottom: "{{bottom_name}}" top: "{{top_name}}"
      convolution_param {
        kernel_size: {{2 * factor - factor % 2}} stride: {{factor}}
        num_output: {{C}} group: {{C}}
        pad: {{ceil((factor - 1) / 2.)}}
        weight_filler: { type: "bilinear" } bias_term: false
      }
      param { lr_mult: 0 decay_mult: 0 }
    }
    

    with the followed instruction:

    By specifying num_output: {{C}} group: {{C}}, it behaves as channel-wise convolution. The filter shape of this deconvolution layer will be (C, 1, K, K) where K is kernel_size, and this filler will set a (K, K) interpolation kernel for every channel of the filter identically. The resulting shape of the top feature map will be (B, C, factor * H, factor * W). Note that the learning rate and the weight decay are set to 0 in order to keep coefficient values of bilinear interpolation unchanged during training.

提交回复
热议问题