Play all

OpenAI o1 type techniques for scaling test time compute

Video Overview temperature, chain of thought

Training compute versus test time compute

Why spend more compute on test time / inference?

Using verifiers to select the best answers

Exploring and critiquing/verifying answers during inference

Understanding Temperature for sampling

Should you set temperature to zero?

Beam search

Problems with setting a non-zero temperature

Using top p, top k, min p, and best of

Recap on choosing temperature for sampling

How to implement chain of thought

Setup for notebook run-through on gsm8k and hotpot qa

Using sampling and chain of thought on hotpotqa and gsm8k

Running vllm in a Jupyter notebook allows for batching

Scoring / Grading with OpenAI gpt4o-mini using regex enforcement

Multi-threading the scoring / grading for speed

Running the dataset multiple times to get the mean and mean absolute deviation of correct answers

Controlling sampling parameters min p, top p, top k, beam search, temperature

Running temperature / sampling ablations WITHOUT chain of thought

Chain of Thought Setup

Running ablations WITH chain of thought

GSM8K Results Charts

Hotpot QA Results Charts

Recommendations on sampling, temperature and chain of thought

Video resources

Description:

Explore advanced inference techniques for large language models in this comprehensive 55-minute video lecture. Dive into the differences between training and test-time compute, and learn why investing more resources in inference can be beneficial. Discover how to use verifiers for selecting optimal answers and explore methods for critiquing responses during inference. Gain a deep understanding of temperature in sampling, including when to use zero or non-zero values, and explore alternatives like beam search, top-p, top-k, and min-p sampling. Learn to implement chain-of-thought reasoning and see practical demonstrations using datasets like GSM8K and HotpotQA. Follow along with notebook run-throughs, learn to use VLLM for efficient batching, and discover techniques for scoring and grading responses. Analyze the results of various sampling and chain-of-thought configurations through detailed charts, and receive expert recommendations on optimizing these parameters for different tasks.

Test Time Compute: Sampling and Chain of Thought Techniques

Trelis Research

Add to list

#Mathematics #Statistics & Probability #Sampling #Computer Science #Machine Learning #Inference #Artificial Intelligence #OpenAI

0:00 / 0:00