Learn about an innovative approach to Large Language Model training in this 24-minute technical presentation that introduces ORPO (Odds Ratio Preference Optimization), a groundbreaking "reference model-free" monolithic optimization algorithm. Explore the theoretical physics perspective behind this new preference-aligned Supervised Fine-Tuning (SFT) method, examining parallels between regularization terms methodologies and Lagrange Multipliers. Delve into how ORPO eliminates the need for a separate preference alignment phase while comparing its performance metrics against LLama 2 and Mistral 7B models. Based on research from the paper "ORPO: Monolithic Preference Optimization without Reference Model," gain insights into this streamlined approach that combines preference alignment directly into the training process.
ORPO: A New Preference-Aligned Training Method for Large Language Models