Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Grab it
Learn about multi-task learning in transformer-based NLP architectures through this 31-minute conference talk that explores cost-effective alternatives to training separate models. Discover how leveraging information across multiple tasks and datasets can enhance performance through shared models, representation bias, increased data efficiency, and eavesdropping. Explore solutions to challenges like catastrophic forgetting and interference, while diving into general approaches to multi-task learning, innovative adapter-based techniques, hypernetwork methods, and strategies for task sampling and balancing. The presentation covers key topics including the Bird Paper, architecture considerations, modularity concepts, function composition, input composition, parameter composition, fusion techniques, and shared hypernetworks, concluding with insights into Chad GP implementations.
Multi-Task Learning in Transformer-Based Architectures for Natural Language Processing