Massachusetts Institute of Technology
Tuning GPT-3 on a Single GPU via Zero-Shot Hyperparameter Transfer
Discover how to tune GPT-3 hyperparameters efficiently on a single GPU using µTransfer, a technique based on the maximal update parametrization that enables optimization of large language models with minimal resources.