Главная
Study mode:
on
1
Fine-tuning tiny multi-modal models
2
Moondream server demo
3
Video Overview
4
Multi-modal model architecture
5
Moondream architecture
6
Moondream vision encoder SigLIP
7
Moondream MLP visionprojection
8
Moondream Language Model Phi
9
Applying LoRA adapters to a multi-modal model
10
Fine-tuning notebook demo
11
Deploying a custom API for multi-modal models
12
vLLM
13
Training a multi-modal model from scratch
14
Multi-modal datasets
15
Video resources
Description:
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only! Grab it Explore the intricacies of fine-tuning and deploying tiny text and vision models in this 44-minute tutorial. Dive into the architecture of multi-modal models, focusing on the Moondream model's components including its vision encoder (SigLIP), MLP (visionprojection), and language model (Phi). Learn how to apply LoRA adapters to multi-modal models and follow along with a hands-on fine-tuning notebook demo. Discover techniques for deploying custom APIs for multi-modal models, utilizing vLLM, and training models from scratch. Gain insights into multi-modal datasets and access a wealth of video resources to further your understanding of advanced vision and language processing techniques.

Tiny Text and Vision Models - Fine-Tuning and API Setup

Trelis Research
Add to list