Play all

Key ideas of the paper

Abstract

Note on k-NN non-parametric machine learning

Data and NPT setup explained

NPT loss is inspired by BERT

A high-level architecture overview

NPT jointly learns imputation and prediction

Architecture deep dive input embeddings, etc

More details on the stochastic masking loss

Connections to Graph Neural Networks and CNNs

NPT achieves great results on tabular data benchmarks

NPT learns the underlying relational, causal mechanisms

NPT does rely on other datapoints

NPT attends to similar vectors

Conclusions

Description:

Dive deep into the world of Non-Parametric Transformers with this comprehensive 46-minute video lecture. Explore the key concepts from the paper "Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning". Learn about the NPT architecture, its connections to BERT, Graph Neural Networks, and CNNs, and understand how it achieves impressive results on tabular data benchmarks. Discover how NPT learns underlying relational and causal mechanisms, and examine its ability to attend to similar vectors. Gain valuable insights into this innovative approach to machine learning through detailed explanations and visual aids.

Non-Parametric Transformers - Paper Explained

Aleksa Gordić - The AI Epiphany

Add to list

#Computer Science #Artificial Intelligence #Neural Networks #Deep Learning #Attention Mechanisms

0:00 / 0:00