From the statistical point of view, the success of DNN is a mystery.
3
Why do Neural Networks work better?
4
The "adaptivity" conjecture
5
NTKs are strictly suboptimal for locally adaptive nonparametric regression
6
Are DNNs locally adaptive? Can they achieve optimal rates for TV-classes/Besov classes?
7
Background: Splines are piecewise polynomials
8
Background: Truncated power basis for splines
9
Weight decay = Total Variation Regularization
10
Weight decayed L-Layer PNN is equivalent to Sparse Linear Regression with learned basis functions
11
Main theorem: Parallel ReLU DNN approaches the minimax rates as it gets deeper.
12
Comparing to classical nonparametric regression methods
13
Examples of Functions with Heterogeneous Smoothness
14
Step 2: Approximation Error Bound
15
Summary of take-home messages
Description:
Explore the intersection of deep learning and nonparametric regression in this 56-minute conference talk presented by Yu-Xiang Wang at USC Information Sciences Institute. Delve into the capabilities of deep neural networks (DNNs) in curve fitting compared to classical tools like splines and wavelets. Gain insights on why DNNs outperform kernels, the advantages of deep networks over shallow ones, the significance of ReLU activation, generalization under overparameterization, the role of sparsity in deep learning, and the validity of the lottery ticket hypothesis. Examine the statistical perspective on DNNs' success, explore the "adaptivity" conjecture, and understand how weight-decayed DNNs relate to total variation regularization. Learn about the equivalence between weight-decayed L-Layer PNNs and sparse linear regression with learned basis functions. Discover how parallel ReLU DNNs approach minimax rates as they deepen, and compare their performance to classical nonparametric regression methods. Investigate examples of functions with heterogeneous smoothness and approximation error bounds.
Read more
Deep Learning Meets Nonparametric Regression: Are Weight-decayed DNNs Locally Adaptive?