Description:

Learn about an innovative approach to Large Language Model training in this 24-minute technical presentation that introduces ORPO (Odds Ratio Preference Optimization), a groundbreaking "reference model-free" monolithic optimization algorithm. Explore the theoretical physics perspective behind this new preference-aligned Supervised Fine-Tuning (SFT) method, examining parallels between regularization terms methodologies and Lagrange Multipliers. Delve into how ORPO eliminates the need for a separate preference alignment phase while comparing its performance metrics against LLama 2 and Mistral 7B models. Based on research from the paper "ORPO: Monolithic Preference Optimization without Reference Model," gain insights into this streamlined approach that combines preference alignment directly into the training process.

ORPO: A New Preference-Aligned Training Method for Large Language Models

Discover AI

Add to list

#Computer Science #Machine Learning #Reinforcement Learning #Artificial Intelligence #Neural Networks #Science #Physics #Theoretical Physics #Model Training #Mathematics #Calculus #Lagrange Multipliers #Supervised Fine-Tuning

ORPO: A New Preference-Aligned Training Method for Large Language Models

ORPO: NEW DPO Alignment and SFT Method for LLM