We use cookies to personalise content and ads, to provide social media features and to analyse our traffic. Read more…

CBMM Special Seminar: Mikhail Belkin, Ohio State University


Get Directions

#var:page_name# cover

CBMM Special Seminar

Title: Fit without fear: an over-fitting perspective on modern deep and shallow learning

Speaker: Mikhail Belkin, Ohio State University
Venue: Singleton Auditorium (46-3002)
Address: MIT Bldg. 46, 43 Vassar St., Cambridge, MA 02139


A striking feature of modern supervised machine learning is its pervasive over-parametrization. Deep networks contain millions of parameters, often exceeding the number of data points by orders of magnitude. These networks are trained to nearly interpolate the data by driving the training error to zero. Yet, at odds with most theory, they show excellent test performance. It has become accepted wisdom that these properties are special to deep networks and require non-convex analysis to understand.

In this talk, I will show that classical (convex) kernel methods do, in fact, exhibit these unusual properties. Moreover, kernel methods provide a competitive practical alternative to deep learning, after we address the non-trivial challenges of scaling to modern big data. I will also present theoretical and empirical results indicating that we are unlikely to make progress on understanding deep learning until we develop a fundamental understanding of classical "shallow" kernel classifiers in the "modern" over-fitted setting. Finally, I will show that ubiquitously used stochastic gradient descent (SGD) is very effective at driving the training error to zero in the interpolated regime, a finding that sheds light on the effectiveness of modern methods and provides specific guidance for parameter selection.

These results present a perspective and a challenge. Much of the success of modern learning comes into focus when considered from over-parametrization and interpolation point of view. The next step is to address the basic question of why classifiers in the "modern" interpolated setting generalize so well to unseen data. Kernel methods provide both a compelling set of practical algorithms and an analytical platform for resolving this fundamental issue.

Based on joint work with Siyuan Ma, Raef Bassily, Chayoue Liu and Soumik Mandal.

Event webpage: https://cbmm.mit.edu/news-events/events/cbmm-special-seminar-fit-without-fear-over-fitting-perspective-modern-deep-and