Explore transfer learning and pre-trained contextualized representations in this 20-minute conference talk from KDD2020. Dive into BERT and its improvements, including span-based efficient pre-training and ROBERTA. Learn about extractive QA, GLUE, and the challenges that remain in the field. Discover potential future directions such as few-shot learning and non-parametric memories. Gain insights from Mandar Joshi on advancing natural language processing techniques through innovative pre-training approaches and model architectures.