Interpolation does not averfit even for very noisy data
8
why bounds fail
9
Interpolation is best practice for deep learning
10
Historical recognition
11
The key lesson
12
Generalization theory for interpolation?
13
A way forward?
14
Interpolated k-NN schemes
15
Interpolation and adversarial examples
16
Double descent risk curve
17
More parameters are better: an example
18
Random Fourier networks
19
what is the mechanism?
20
Double Descent in Randon Feature settings
21
Smoothness by averaging
22
Framework for modern ML
23
The landscape of generalization
24
Optimization: classical
25
Modern Optimization
26
From classical statistics to modern ML
27
The nature of inductive bias
28
Memorization and interpolation
29
Interpolation in deep auto-encoders
30
Neural networks as models for associative memory
31
Why are attractors surprising?
32
Memorizing sequences
Description:
Explore the paradigm shift in machine learning theory presented by Professor Mikhail Belkin in this thought-provoking lecture. Delve into the apparent contradiction between classical statistical wisdom and modern deep learning practices, where over-parameterized models with near-perfect training data fit show excellent test performance. Examine the challenges this poses to traditional Empirical Risk Minimization concepts and discover the emerging "double descent" risk curve that unifies classical and modern models. Investigate the nature of inductive bias in deep learning, particularly in auto-encoders, and their potential implementation of associative memory. Gain insights into the evolving landscape of generalization theory, optimization techniques, and the role of memorization in neural networks.
Beyond Empirical Risk Minimization - The Lessons of Deep Learning