Play all

"Video + Text" from "Image + Text" models

Clipping and Querying Videos with an IDEFICS 2 endpoint

Fine-tuning video + text models

Dataset generation for video fine-tuning + pushing to hub

Clipping and querying videos with image splitting in a Jupyter Notebook

Side-note - IDEFICS 2 vision to text adapter architecture

Video clip notebook evaluation - continued

Loading a video dataset for fine-tuning

Recap of video + text model fine-tuning

Description:

Learn how to fine-tune multi-modal video and text models in this comprehensive tutorial. Explore techniques for clipping and querying videos using an IDEFICS 2 endpoint, generate datasets for video fine-tuning, and push them to a hub. Discover methods for image splitting in Jupyter Notebooks and understand the IDEFICS 2 vision-to-text adapter architecture. Follow along as the instructor demonstrates loading video datasets for fine-tuning and provides a recap of the entire process. Gain valuable insights into transforming image and text models into powerful video and text analysis tools.

Fine-tuning Multi-modal Video and Text Models

Trelis Research

Add to list

#Computer Science #Machine Learning #Fine-Tuning #Artificial Intelligence #Computer Vision #Data Science #Jupyter Notebooks #Model Evaluation #Hugging Face

0:00 / 0:00