Play all

Fine-tuning tiny multi-modal models

Moondream server demo

Video Overview

Multi-modal model architecture

Moondream architecture

Moondream vision encoder SigLIP

Moondream MLP visionprojection

Moondream Language Model Phi

Applying LoRA adapters to a multi-modal model

Fine-tuning notebook demo

Deploying a custom API for multi-modal models

vLLM

Training a multi-modal model from scratch

Multi-modal datasets

Video resources

Description:

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only! Grab it Explore the intricacies of fine-tuning and deploying tiny text and vision models in this 44-minute tutorial. Dive into the architecture of multi-modal models, focusing on the Moondream model's components including its vision encoder (SigLIP), MLP (visionprojection), and language model (Phi). Learn how to apply LoRA adapters to multi-modal models and follow along with a hands-on fine-tuning notebook demo. Discover techniques for deploying custom APIs for multi-modal models, utilizing vLLM, and training models from scratch. Gain insights into multi-modal datasets and access a wealth of video resources to further your understanding of advanced vision and language processing techniques.

Tiny Text and Vision Models - Fine-Tuning and API Setup

Trelis Research

Add to list

#Computer Science #Machine Learning #Fine-Tuning #Programming #Web Development #API Development #API Deployment #vLLM