Семинар HDI&TFAIM Lab «Learning the optimal control policy for fine-tuning a given diffusion process»
Abstract: We study the problem of learning the optimal control policy for fine-tuning a given diffusion process, using general value function approximation. We develop a new class of algorithms by solving a variational inequality problem based on the Hamilton-Jacobi-Bellman (HJB) equations. We prove sharp statistical rates for the learned value function and control policy, depending on the complexity and approximation errors of the function class. In contrast to generic reinforcement learning problems, our approach shows that fine-tuning can be achieved via supervised regression, with faster statistical rate guarantees.
Bio: Wenlong Mou is an Assistant Professor in the Department of Statistical Sciences at University of Toronto. In 2023, he received his Ph.D. degree in Electrical Engineering and Computer Sciences (EECS) from UC Berkeley. Prior to Berkeley, he received his B.Sc. degree in Computer Science and B.A. degree in Economics, both from Peking University. Wenlong's research interests include machine learning theory, mathematical statistics, optimization, and applied probability. He is particularly interested in data-driven decision-making in modern AI paradigms. His works have been published in many leading journals in statistical machine learning. His research has been recognized by the INFORMS Applied Probability Society as a Best Student Paper finalist.
По всем вопросам обращайтесь к Зеленовой Карине Михайловне kzelenova@hse.ru или к Горностаевой Екатерине Дмитриевне egornostaeva@hse.ru