Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Grab it
Learn about three distinct approaches to expanding Large Language Models (LLMs) beyond text-only capabilities in this technical video presentation. Explore the evolution of multimodal AI systems, starting with the integration of external tools, moving to adapter-based architectures, and culminating in unified models. Discover practical implementations through a hands-on demonstration using LLaMA 3.2 for vision tasks with Ollama, supported by extensive academic research and real-world applications. Gain insights into the future trajectory of multimodal AI development, with comprehensive references to key research papers and technical resources for further exploration.
Multimodal AI: Understanding Large Language Models with Vision and Audio Capabilities