Play all

Intro

Supervised ML

Interpolation and Overfitting

Modern ML

Fit without Fear

Overfitting perspective

Kernel machines

Interpolation in kernels

Interpolated classifiers work

what is going on?

Performance of kernels

Kernel methods for big data

The limits of smooth kernels

Eigenpro: practical implementation

Comparison with state-of-the-art

Improving speech intelligibility

Stochastic Gradient Descent

The Power of Interpolation

Optimality of mini-batch size 1

Minibatch size?

Real data example

Learning kernels for parallel computation?

Theory vs practice

Model complexity of interpolation?

How to test model complexity?

Testing model complexity for kernels

Levels of noise

Theoretical analyses fall short

Simplicial interpolation A-fit

Nearly Bayes optimal

Parting Thoughts I

Description:

Explore the intriguing world of over-parametrization in modern supervised machine learning through this 55-minute lecture by Mikhail Belkin from Ohio State University. Delve into the paradox of deep networks with millions of parameters that interpolate training data yet perform excellently on test sets. Discover how classical kernel methods exhibit similar properties to deep learning and offer competitive alternatives when scaled for big data. Examine the effectiveness of stochastic gradient descent in driving training errors to zero in the interpolated regime. Gain insights into the challenges of understanding deep learning and the importance of developing a fundamental grasp of "shallow" kernel classifiers in over-fitted settings. Explore the perspective that much of modern learning's success can be understood through the lens of over-parametrization and interpolation, and consider the crucial question of why classifiers in this "modern" interpolated setting generalize so well to unseen data. Read more

Fit Without Fear - An Over-Fitting Perspective on Modern Deep and Shallow Learning

MITCBMM

Add to list

#Computer Science #Machine Learning #Kernel Methods #Deep Learning #Supervised Learning #Mathematics #Interpolation #Overfitting #Stochastic Gradient Descent