Главная
Study mode:
on
1
Introduction -
2
Multimodal LLMs -
3
Path 1: LLM + Tools -
4
Path 2: LLM + Adapaters -
5
Path 3: Unified Models -
6
Example: LLaMA 3.2 for Vision Tasks Ollama -
7
What's next? -
Description:
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only! Grab it Learn about three distinct approaches to expanding Large Language Models (LLMs) beyond text-only capabilities in this technical video presentation. Explore the evolution of multimodal AI systems, starting with the integration of external tools, moving to adapter-based architectures, and culminating in unified models. Discover practical implementations through a hands-on demonstration using LLaMA 3.2 for vision tasks with Ollama, supported by extensive academic research and real-world applications. Gain insights into the future trajectory of multimodal AI development, with comprehensive references to key research papers and technical resources for further exploration.

Multimodal AI: Understanding Large Language Models with Vision and Audio Capabilities

Shaw Talebi
Add to list
0:00 / 0:00