MXNet: nn.Activation vs nd.relu?

喜夏-厌秋 提交于 2019-12-10 18:28:22

问题


I am new to MXNet (I am using it in Python3)

Their tutorial series encourages you define your own gluon blocks.

So lets say this is your block (a common convolution structure):

class CNN1D(mx.gluon.Block):
    def __init__(self, **kwargs):
        super(CNN1D, self).__init__(**kwargs)
        with self.name_scope():
            self.cnn = mx.gluon.nn.Conv1D(10, 1)
            self.bn = mx.gluon.nn.BatchNorm()
            self.ramp = mx.gluon.nn.Activation(activation='relu')

    def forward(self, x):
        x = mx.nd.relu(self.cnn(x))
        x = mx.nd.relu(self.bn(x))
        x = mx.nd.relu(self.ramp(x))
        return x

This is mirror the structure of their example. What is the difference of mx.nd.relu vs mx.gluon.nn.Activation?

Should it be

x = self.ramp(x)

instead of

x = mx.nd.relu(self.ramp(x))

回答1:


It appears that

mx.gluon.nn.Activation(activation=<act>)

is a wrapper for calling a host of the underlying activations from the NDArray module.

Thus - in principle - it does not matter if in the forward definition one uses

x = self.ramp(x)

or

x = mx.nd.relu(x)

or

x = mx.nd.relu(self.ramp(x))

as relu is simply taking the max of 0 and the passed value (so multiple applications will not affect the value any more than a single call besides from a slight runtime duration increase).

Thus in this case it doesnt really matter. Of course with other activation functions stacking multiple calls might have an impact.

In MXNets documentation they use nd.relu in the forward definition when defining gluon.Blocks. This might carry slightly less overhead than using mx.gluon.nn.Activation(activation='relu').

Flavor-wise the gluon module is meant to be the high level abstraction. Therefore I am of the opinion that when defining a block one should use ramp = mx.gluon.nn.Activation(activation=<act>) instead of nd.<act>(x) and then call self.ramp(x) in the forward definition.

However given that at this point all custom Block tutorials / documentation stick to relu activation, whether or not this will have lasting consequences is yet to be seen.

All together the use of mx.gluon.nn.Activation seems to be a way to call activation functions from the NDArray module from the Gluon module.




回答2:


mx.gluon.nn.Activation wraps around mx.ndarray.Activation, see Gluon source code.

However, when using Gluon to build a neural net, it is recommended that you use the Gluon API and not branch off to use the lower level MXNet API arbitrarily - which may have issues as Gluon evolves and potentially change (e.g. stop using mx.nd under the hood).



来源:https://stackoverflow.com/questions/46285711/mxnet-nn-activation-vs-nd-relu

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!