Explore cutting-edge approaches to sensor fusion and representation learning for autonomous driving in this 33-minute conference talk. Delve into the limitations of geometry-based methods and discover TransFuser, a novel Multi-Modal Fusion Transformer that integrates image and LiDAR data using attention mechanisms. Learn about NEural ATtention fields (NEAT), an innovative representation for reasoning about semantic, spatial, and temporal aspects of driving scenes. Examine state-of-the-art performance results on the CARLA simulator, and gain insights into attention map visualizations, BEV semantics, and architectural details of these advanced driving models.