Variational Dropout for Deep Neural Networks and Linear Model

We develop a novel approach to neural net regularization. This approach is based on so called Variational Dropout [1], a recent bayesian generalization of gaussian dropout. Our contribution is twofold. Firstly, it allows to tune separate dropout rates for each layer, feature, weight or neuron. Secondly, it allows to automatically tune the topology of the network by driving the dropout probability to 1 for individual neurons and effectively excluding them from the model. We have already applied this approach to linear models. It resulted in a new way of bayesian feature selection with performance, comparable to classic models like the Relevance Vector Machine. Our future work is to make this technique work on neural nets just as well.

Till now dropout rates were model's hyperparameters, so it could be determined by any technique of hyperparameter optimization like Random Search [4] or Tree of Parzen Estimators [5]. However, all these techniques require to train Deep Neural Networks many times, so they are computationally inefficient. Some empirical techniques of tuning dropout rate have been offered [2, 3], but they can't provide a general solution. In our approach dropout rates are model parameters and can be optimized during training. It also provides all the appealing features of bayesian models like ensembling, providing distributions instead of point estimates and etc. Also there is currently no efficient way to train sparse neural networks and adaptively tune their topology, our approach may make it possible.

[1] Kingma et al.: Variational Dropout and the Local Reparameterization Trick, NIPS15

[2] Wager et al.: Dropout Training as Adaptive Regularization, NIPS13

[3] Ba et al.: Adaptive dropout for training deep neural network, NIPS13

[4] Bergstra et al: Random Search for Hyper-Parameter Optimization, JMLR13

[5] Bergstra et al: Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures, ICML13

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!
To be used only for spelling or punctuation mistakes.

Bayesian Methods Research Group

Variational Dropout for Deep Neural Networks and Linear Model