Randomly zero out entire channels (a channel is a 2D feature map, e.g., the -th channel of the -th sample in the batched input is a 2D tensor ). Each channel will be zeroed out independently on every forward call with probability
pusing samples from a Bernoulli distribution.
Usually the input comes from
As described in the paper Efficient Object Localization Using Convolutional Networks , if adjacent pixels within feature maps are strongly correlated (as is normally the case in early convolution layers) then i.i.d. dropout will not regularize the activations and will otherwise just result in an effective learning rate decrease.
In this case,
nn.Dropout2d()will help promote independence between feature maps and should be used instead.
Output: (same shape as input)
>>> m = nn.Dropout2d(p=0.2) >>> input = torch.randn(20, 16, 32, 32) >>> output = m(input)