I have read the documentation about the group param:
group (g) [default 1]: If g > 1, we restrict the connectivity of each filter to a su
The argument gives the quantity of groups, not the size. If you have 40 inputs and set g to 20, you'll get 20 "lanes" of 2 channels each; with 50 outputs, you'd get 10 groups of 2 and 10 groups of 3.
More often, you split into a small number of groups, such as 2. In that case, you'd have two processing "lanes" or groups. For the 40=>50 layer you mention, each group would have 20 inputs and 25 outputs. Each layer will split in half, with each set of forward and backward propagation working only within its own half, for the range of layers over which the group parameter applies (I think it's all the way to the final layer).
The processing advantage is that instead of 40^2 input connections, you have 2 groups of 20^2 connections, or half as many. This accelerates the processing by roughly 2x, with a very small loss in convergence progress.