Play all

Introduction -

Multimodal LLMs -

Path 1: LLM + Tools -

Path 2: LLM + Adapaters -

Path 3: Unified Models -

Example: LLaMA 3.2 for Vision Tasks Ollama -

What's next? -

Description:

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only! Grab it Learn about three distinct approaches to expanding Large Language Models (LLMs) beyond text-only capabilities in this technical video presentation. Explore the evolution of multimodal AI systems, starting with the integration of external tools, moving to adapter-based architectures, and culminating in unified models. Discover practical implementations through a hands-on demonstration using LLaMA 3.2 for vision tasks with Ollama, supported by extensive academic research and real-world applications. Gain insights into the future trajectory of multimodal AI development, with comprehensive references to key research papers and technical resources for further exploration.

Multimodal AI: Understanding Large Language Models with Vision and Audio Capabilities

Shaw Talebi

Add to list

#Computer Science #Artificial Intelligence #Multimodal AI #Machine Learning #Computer Vision #Neural Networks #Natural Language Processing (NLP) #LLM (Large Language Model) #LLaMA (Large Language Model Meta AI) #Speech Recognition #Ollama

0:00 / 0:00