Коллоквиум "A Functional Perspective for Understanding Scaling Laws"

25 November, 14:40 – 16:00

Speaker: Prof. Lei Wu, Peking

Biography:
Lei Wu is an Assistant Professor at the School of Mathematical Sciences and the Center for International Machine Learning Research at Peking University. His research focuses on the mathematical foundations of deep learning. He received his B.S. in Mathematics and Applied Mathematics from Nankai University in 2012, and his Ph.D. in Computational Mathematics from Peking University in 2018. From November 2018 to October 2021, he conducted postdoctoral research at Princeton University and the University of Pennsylvania. His work has been published in top conferences and journals such as NeurIPS, ICML, AOS, TIT, and JMLR.

Abstract:
Scaling laws for large language models (LLMs) reveal a striking empirical regularity: model performance improves according to predictable power laws as training data and compute scale. These laws have profoundly shaped the development of modern AI, yet their origins have remained largely empirical and theoretically unexplained. To uncover the underlying mechanism, we introduce power-law kernel regression, a minimal yet structurally faithful model that captures the essential ingredients driving scaling behavior. By analyzing its stochastic training dynamics through a continuous-time stochastic differential equation, we develop the framework of Functional Scaling Laws (FSL). FSL elevates classical scaling laws from predicting a final-step loss to predicting the entire loss trajectory. This functional viewpoint reveals an intrinsic-time structure that unifies training dynamics across model sizes, data scales, and learning-rate schedules. In particular, FSL provides a principled explanation for why widely used schedules—such as warmup–stable–decay—are so effective. Finally, experiments on LLM pre-training demonstrate that FSL offers a principled framework for both understanding and guiding large-scale model training.

Добавить в календарь

Дата

25 ноября 14:40

В статье упомянуты

Факультет компьютерных наук