Rutgers University Department of Physics and Astronomy

Greg Yang
(Microsoft Research)

Title: Renormalizing the Optimal Hyperparameters of a Neural Network

Abstract: Hyperparameter tuning in deep learning is an expensive process, prohibitively so for neural networks (NNs) with billions of parameters that often can only be trained once. We show that, in the recently discovered Maximal Update Parametrization (μP), many optimal hyperparameters remain stable even as model size changes. Using this insight, for example, we are able to re-tune the 6.7-billion-parameter model of GPT-3 and obtain performance comparable to the 13-billion-parameter model of GPT-3, effectively doubling the model size.

For help, please contact Webmaster.