Play all

Intro

Tensor (CP) decomposition

Why naïve algorithm fails

Why gradient descent?

Two-Layer Neural Network

Form of the objective

Difficulties of analyzing gradient descent

Lazy training fails

O is a high order saddle point

Our (high level) algorithm

Proof ideas

Iterates remain close to correct subspace

Escaping local minima by random correlation

Amplify initial correlation by tensor power method

Conclusions and Open Problems

Description:

Explore a lecture on over-parameterized tensor decomposition and its applications beyond lazy training. Delve into the mathematical foundations and algorithms for tensor computations, focusing on how gradient descent variants can find approximate tensor decompositions. Learn about the limitations of lazy training regimes, the challenges in analyzing gradient descent, and a novel high-level algorithm that overcomes these obstacles. Discover how this research relates to training neural networks and utilizing low-rank structure in data. Gain insights into the proof ideas, including maintaining iterates close to the correct subspace and escaping local minima through random correlation and tensor power methods.

Beyond Lazy Training for Over-parameterized Tensor Decomposition

Institute for Pure & Applied Mathematics (IPAM)

Add to list

#Mathematics #Algebra #Linear Algebra #Tensor Decomposition #Data Science #Computer Science #Artificial Intelligence #Neural Networks #Machine Learning #Gradient Descent #Science #Physical Sciences

0:00 / 0:00