Neural Net Bias per Layer or per Node (non-input node)

ぐ巨炮叔叔 提交于 2019-12-03 06:41:51

Intuitive View

To answer this question properly, we should first establish exactly what we mean when we say "Bias value" as done in the question. Neural Networks are typically intuitively viewed (and explained to beginners) as a network of nodes (neurons) and weighted, directed connections between nodes. In this view, Biases are very frequently drawn as additional ''input'' nodes, which always have an activation level of exactly 1.0. This value of 1.0 may be what some people think of when they hear "Bias Value". Such a Bias Node would have connections to other nodes, with trainable weights. Other people may think of those weights as "Bias Values". Since the question was tagged with the bias-neuron tag, I'll answer the question under the assumption that we use the first definition, e.g. Bias Value = 1.0 for some Bias Node / neuron.

From this point of view... it absolutely does not matter at all mathematically how many Bias nodes/values we put in our network, as long as we make sure to connect them to the correct nodes. You could intuitively think of the entire network as having only a single bias node with a value of 1.0 that does not belong to any particular layer, and has connections to all nodes other than the input nodes. This may be difficult to draw though, if you want to make a drawing of your neural network it may be more convenient to place a separate bias node (each with a value of 1.0) in every layer except for the output layer, and connect each of those bias nodes to all the nodes in the layer directly after it. Mathematically, these two interpretations are equivalent, since in both cases every non-input node has an incoming weighted connection from a node that always has an activation level of 1.0.

Programming View

When Neural Networks are programmed, there typically aren't any explicit node ''objects'' at all (at least in efficient implementations). There will generally just be matrices for the weights. From this point of view, there is no longer any choice. We'll (almost) always want one ''bias-weight'' (a weight being multiplied by a constant activation level of 1.0) going to every non-input node, and we'll have to make sure all those weights appear in the correct spots in our weight matrices.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!