Occasionally I see some models are using SpatialDropout1D
instead of Dropout
. For example, in the Part of speech tagging neural network, they use:
To make it simple, I would first note that so-called feature maps (1D, 2D, etc.) is our regular channels. Let's look at examples:
Dropout()
: Let's define 2D input: [[1, 1, 1], [2, 2, 2]]. Dropout will consider every element independently, and may result in something like [[1, 0, 1], [0, 2, 2]]
SpatialDropout1D()
: In this case result will look like [[1, 0, 1], [2, 0, 2]]. Notice that 2nd element was zeroed along all channels.