CS计算机代考程序代写 python javascript chain Bioinformatics deep learning Java Bayesian cuda computational biology algorithm Hive Dropout: a simple way to prevent neural networks from overfitting: The Journal of Machine Learning Research: Vol 15, No 1

程序代写 CS代考 / Algorithm算法代写代考, Bioinformatics, computational biology, cuda, deep learning深度学习代写代考, javascript, Java代写代考, Python代写代考

Dropout: a simple way to prevent neural networks from overfitting: The Journal of Machine Learning Research: Vol 15, No 1

Advanced Search

Browse

About

Sign in

Register

Advanced Search
Journals
Magazines
Proceedings
Books
SIGs
Conferences
People

More

Search ACM Digital Library

SearchSearch

Advanced Search

The Journal of Machine Learning Research
Journal Home
Forthcoming
Latest Issue

Archive
Author List
Affiliations
Award Winners
More

HomeCollectionsHosted ContentThe Journal of Machine Learning ResearchVol. 15, No. 1Dropout: a simple way to prevent neural networks from overfitting

article Free Access

Dropout: a simple way to prevent neural networks from overfitting

Share on

Authors:
Nitish Srivastava
Department of Computer Science, University of Toronto, Toronto, Ontario, Canada

Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
View Profile

,
Geoffrey Hinton
Department of Computer Science, University of Toronto, Toronto, Ontario, Canada

Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
View Profile

,
Alex Krizhevsky
Department of Computer Science, University of Toronto, Toronto, Ontario, Canada

Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
View Profile

,
Ilya Sutskever
Department of Computer Science, University of Toronto, Toronto, Ontario, Canada

Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
View Profile

,
Ruslan Salakhutdinov
Department of Computer Science, University of Toronto, Toronto, Ontario, Canada

Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
View Profile

Authors Info & Claims

The Journal of Machine Learning ResearchVolume 15Issue 1January 2014 pp 1929–1958

Published:01 January 2014

1,649citation
12,153

DownloadsMetrics
Total Citations1,649
Total Downloads12,153
Last 12 Months3,119
Last 6 weeks156

Get Citation AlertsNew Citation Alert added!

This alert has been successfully added and will be sent to:
You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.
Manage my Alerts

New Citation Alert!

Please log in to your account

Save to BinderSave to Binder

Create a New BinderName

Cancel
Create

Export Citation
Publisher Site

eReader
PDF

The Journal of Machine Learning Research
Volume 15, Issue 1

PreviousArticleNextArticle

Abstract

Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different “thinned” networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

References

M. Chen, Z. Xu, K. Weinberger, and F. Sha. Marginalized denoising autoencoders for domain adaptation. In Proceedings of the 29th International Conference on Machine Learning, pages 767-774. ACM, 2012.Google ScholarDigital Library
G. E. Dahl, M. Ranzato, A. Mohamed, and G. E. Hinton. Phone recognition with the mean-covariance restricted Boltzmann machine. In Advances in Neural Information Processing Systems 23, pages 469-477, 2010.Google Scholar
O. Dekel, O. Shamir, and L. Xiao. Learning to classify with missing and corrupted features. Machine Learning, 81(2):149-178, 2010. Google ScholarDigital Library
A. Globerson and S. Roweis. Nightmare at test time: robust learning by feature deletion. In Proceedings of the 23rd International Conference on Machine Learning, pages 353-360. ACM, 2006. Google ScholarDigital Library
I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio. Maxout networks. In Proceedings of the 30th International Conference on Machine Learning, pages 1319- 1327. ACM, 2013.Google Scholar
G. Hinton and R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504-507, 2006.Google Scholar
G. E. Hinton, S. Osindero, and Y. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18:1527-1554, 2006. Google ScholarDigital Library
K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun. What is the best multi-stage architecture for object recognition? In Proceedings of the International Conference on Computer Vision (ICCV’09). IEEE, 2009.Google ScholarCross Ref
A. Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.Google Scholar
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, pages 1106-1114, 2012.Google ScholarDigital Library
Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural Computation , 1(4):541-551, 1989. Google ScholarDigital Library
Y. Lin, F. Lv, S. Zhu, M. Yang, T. Cour, K. Yu, L. Cao, Z. Li, M.-H. Tsai, X. Zhou, T. Huang, and T. Zhang. Imagenet classification: fast descriptor coding and large-scale svm training. Large scale visual recognition challenge, 2010.Google Scholar
A. Livnat, C. Papadimitriou, N. Pippenger, and M. W. Feldman. Sex, mixability, and modularity. Proceedings of the National Academy of Sciences, 107(4):1452-1457, 2010.Google ScholarCross Ref
V. Mnih. CUDAMat: a CUDA-based matrix class for Python. Technical Report UTML TR 2009-004, Department of Computer Science, University of Toronto, November 2009.Google Scholar
A. Mohamed, G. E. Dahl, and G. E. Hinton. Acoustic modeling using deep belief networks. IEEE Transactions on Audio, Speech, and Language Processing, 2010. Google ScholarDigital Library
R. M. Neal. Bayesian Learning for Neural Networks. Springer-Verlag New York, Inc., 1996. Google ScholarDigital Library
Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, 2011.Google Scholar
S. J. Nowlan and G. E. Hinton. Simplifying neural networks by soft weight-sharing. Neural Computation, 4(4), 1992. Google ScholarDigital Library
D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely. The Kaldi Speech Recognition Toolkit. In IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, 2011.Google Scholar
R. Salakhutdinov and G. Hinton. Deep Boltzmann machines. In Proceedings of the International Conference on Artificial Intelligence and Statistics, volume 5, pages 448-455, 2009.Google Scholar
R. Salakhutdinov and A. Mnih. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In Proceedings of the 25th International Conference on Machine Learning. ACM, 2008. Google ScholarDigital Library
J. Sanchez and F. Perronnin. High-dimensional signature compression for large-scale image classification. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, pages 1665-1672, 2011. Google ScholarDigital Library
P. Sermanet, S. Chintala, and Y. LeCun. Convolutional neural networks applied to house numbers digit classification. In International Conference on Pattern Recognition (ICPR 2012), 2012.Google Scholar
P. Simard, D. Steinkraus, and J. Platt. Best practices for convolutional neural networks applied to visual document analysis. In Proceedings of the Seventh International Conference on Document Analysis and Recognition, volume 2, pages 958-962, 2003. Google ScholarDigital Library
J. Snoek, H. Larochelle, and R. Adams. Practical Bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems 25, pages 2960-2968, 2012.Google Scholar
N. Srebro and A. Shraibman. Rank, trace-norm and max-norm. In Proceedings of the 18th annual conference on Learning Theory, COLT’05, pages 545-560. Springer-Verlag, 2005. Google ScholarDigital Library
N. Srivastava. Improving Neural Networks with Dropout. Master’s thesis, University of Toronto, January 2013.Google Scholar
R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B. Methodological, 58(1):267-288, 1996.Google ScholarCross Ref
A. N. Tikhonov. On the stability of inverse problems. Doklady Akademii Nauk SSSR, 39(5): 195-198, 1943.Google Scholar
L. van der Maaten, M. Chen, S. Tyree, and K. Q. Weinberger. Learning with marginalized corrupted features. In Proceedings of the 30th International Conference on Machine Learning, pages 410-418. ACM, 2013.Google Scholar
P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, pages 1096-1103. ACM, 2008. Google ScholarDigital Library
P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. In Proceedings of the 27th International Conference on Machine Learning, pages 3371-3408. ACM, 2010.Google Scholar
S. Wager, S. Wang, and P. Liang. Dropout training as adaptive regularization. In Advances in Neural Information Processing Systems 26, pages 351-359, 2013.Google Scholar
S. Wang and C. D. Manning. Fast dropout training. In Proceedings of the 30th International Conference on Machine Learning, pages 118-126. ACM, 2013.Google ScholarDigital Library
H. Y. Xiong, Y. Barash, and B. J. Frey. Bayesian prediction of tissue-regulated splicing using RNA sequence and cellular context. Bioinformatics, 27(18):2554-2562, 2011. Google ScholarDigital Library
M. D. Zeiler and R. Fergus. Stochastic pooling for regularization of deep convolutional neural networks. CoRR, abs/1301.3557, 2013.Google Scholar

Index Terms
(auto-classified)

Dropout: a simple way to prevent neural networks from overfitting
Computing methodologies

Machine learning

Machine learning approaches

Neural networks

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in

Full Access
Get this Article

Information
Contributors

Published in

The Journal of Machine Learning Research Volume 15, Issue 1
January 2014
4085 pages
ISSN:1532-4435
EISSN:1533-7928
Issue’s Table of Contents

Sponsors

In-Cooperation

Publisher

JMLR.org

Publication History
Published: 1 January 2014

Author Tags
regularization
deep learning
model combination
neural networks

Qualifiers
article

Conference

Funding Sources

Other Metrics
View Article Metrics

Bibliometrics
Citations1,649

Article Metrics
1,649
Total Citations
View Citations
12,153
Total Downloads

Downloads (Last 12 months)3,119
Downloads (Last 6 weeks)156

Other Metrics
View Author Metrics

Cited By
View all

PDF Format
View or Download as a PDF file.
PDF

eReader
View online with eReader.
eReader

Digital Edition
View this article in digital edition.
View Digital Edition

Figures
Other

Share this Publication link
https://dl.acm.org/doi/10.5555/2627435.2670313
Copy Link

Share on Social Media

Share on

0References

Close Figure Viewer

Browse AllReturnChange zoom level

Caption

View Issue’s Table of Contents

Export Citations

Select Citation formatBibTeX
EndNote
ACM Ref

Download citation
Copy citation

Preview is not available.
By clicking download,a new tab will open to start the export process. The process may takea few minutes but once it finishes a file will be downloaded on your browser soplease do not close the new tab.
Download

Categories

Journals
Magazines
Books
Proceedings
SIGs
Conferences
Collections
People

About

About ACM Digital Library
Subscription Information
Author Guidelines
Using ACM Digital Library
All Holdings within the ACM Digital Library
ACM Computing Classification System

Join

Join ACM
Join SIGs
Subscribe to Publications
Institutions and Libraries

Connect
Contact
Facebook
Twitter
Linkedin

The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2021 ACM, Inc.

Terms of Usage
Privacy Policy
Code of Ethics

About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more
Got it!