Научный семинар
Объединенный научный семинар Международной лаборатории стохастических алгоритмов и анализа многомерных данных и Лаборатории теоретических основ искусственного интеллекта.
Семинар объединяет исследователей, студентов и аспирантов из НИУ ВШЭ, Сколтеха, ИППИ РАН и других ведущих российских университетов и научных центров. На семинаре обсуждаются задачи на стыке математики и компьютерных наук: вероятность и математическая статистика в высокой размерности, машинное обучение, оптимизация, численные методы и др.
Организаторы семинара: Никита Пучкин и Сергей Самсонов.
Для докладчиков
Желающих сделать доклад на семинаре просим присылать на электронный адрес npuchkin@hse.ru или svsamsonov@hse.ru заявки с указанием темы доклада, аннотации на английском языке (не более 1/2 стр.) и приемлемой даты выступления. Докладчик должен заранее сообщить о выбранной им форме доклада: на меловой доске или компьютерная презентация (предпочтительная форма доклада — на меловой доске). Продолжительность доклада — 1 час (при необходимости продолжительность доклада может быть увеличена до 1.5-2 часов).
В осеннем семестре 2024 года семинар проходит по четвергам в 14:40.
За обновлениями следите в нашем телеграм-канале: HDI and TFAIM Lab joint seminar.
Если у Вам необходим пропуск в корпус НИУ ВШЭ, напишите Зеленовой Карине по адресу: kzelenova@hse.ru
2024
Speaker | Title | Abstract | |
18.01.2024 |
Marina Sheshukova (HSE University) and Arthur Goldman (HSE University) |
Part l: Stochastic Methods in Variational Inequalities: Ergodicity, Bias and Refinements |
At the seminar, we will discuss the general formulation of the problem of variational inequalities. We will analyze two algorithms Stochastic Gradient Descent Ascent and Double Step-size Stochastic Extra Gradient as well as their theoretical properties.
In Part ll, we will look at how it is possible to prove the convergence of measures according to the Waserstein metric through the construction of caplings. |
25.01.2024 | Alexander Beznosichov (MIPT) | On Distributed Optimization Problems under Similarity: Optimal Algorithms and Fast Extensions | In this presentation, we consider distributed methods for solving optimization problems, namely variational inequalities, which are interesting both in themselves and because they include both classical minimization problems and minimax problems. In the distributed formulation, the target operator of a variational inequality is divided into parts, and each of these parts can be accessed only by a local agent/worker. We deal with the case where the local operators are "similar" to each other in some sense. Due to the "similarity" it is possible to achieve a significant acceleration of the theoretical guarantees of convergence of methods in terms of estimates on communication complexity. Besides the issue of convergence of algorithms and obtaining upper bounds, we touch upon lower complexity bounds and verify the optimality of the proposed methods. In the remaining time, we try to discuss the question of how we can "break through" the lower estimates and construct an even faster method, in particular, we additionally introduce the possibility of compressing the transmitted information, modify the proposed algorithms and obtain upper and lower bounds in a new formulation. |
15.02.2024 | Ilya Levin (HSE University) | Central Limit Theorems for Stochastic Approximation | На докладе разберем две работы, связанные с анализом ЦПТ для стохастической аппроксимации(СА). В первой работе (https://www.esaim-ps.org/articles/ps/pdf/2015/01/ps140013.pdf) представлен анализ ЦПТ для нелинейной СА. Во второй работе (https://arxiv.org/pdf/2401.15719.pdf) представлен неасимптотический анализ ЦПТ для функционала марковской цепи, с помощью которого можно проанализировать задачу линейной СА, в частности алгоритм TD(0). |
22.02.2024 | Vladimir Spokoiny (HSE University, IITP RAS, Humboldt University and WIAS Berlin) | Gaussian Variational Inference | Variational inference (VI) considers the problem of approximating a given measure (posterior) by a member of a given parametric family by minimizing the related KL-divergence. Gaussian VI focuses on the Gaussian approximation. We revisit the recent results by Katsevich and Rigollet (2023) and demonstrate how this problem can be solved using some advanced tools from optimization (the basic lemma). |
29.02.2024 | Yuri Svirshchevsky (HSE University) | Monte Carlo Variational Auto-Encoders | Вариационные автокодировщики — генеративные модели, которые обучаются за счёт максимизации какой-нибудь нижней оценки обоснованности наблюдаемых данных (evidence lower bound или ELBO). От точности оценки зависит качество обученной модели. На семинаре будут разбираться способы получения двух ELBO, основанных на Annealed Importance Sampling и Sequential Importance Sampling, их дифференцирование и применение в VAE. |
07.03.2024 | Nikita Puchkin (HSE University) | Dimension-free Structured Covariance Estimation | Given a sample of i.i.d. high-dimensional centered random vectors, we consider a problem of estimation of their covariance matrix $\Sigma$ with an additional assumption that $\Sigma$ can be represented as a sum of a few Kronecker products of smaller matrices. Under mild conditions, we derive the first non-asymptotic dimension-free high-probability bound on the Frobenius distance between $\Sigma$ and a widely used penalized permuted least squares estimate. Because of the hidden structure, the established rate of convergence is faster than in the standard covariance estimation problem. |
14.03.2024 | Alexander Osinsky (Skoltech, IVM RAS) | Existence and construction of column and cross approximations of matrices close to optimal: probabilistic estimates | Крестовые аппроксимации матриц часто используются для приближения больших массивов данных благодаря тому, что могут быть построены исходя из лишь небольшого числа строк и столбцов матриц. При этом в основе используемых на практике алгоритмов, таких как adaptive cross или maxvol, лежит идея максимизации объема подматрицы. Хотя такой подход формально может гарантировать небольшую величину только поэлементной ошибки, обычно наблюдается высокая точность методов также и в норме Фробениуса, несмотря на существование контрпримеров. В данном докладе будет рассмотрен вопрос об эффективности такого подхода к аппроксимации randsvd матриц и показано, что выбор подматриц из принципа максимального объема позволяет с высокой вероятностью (т.е. для большинства randsvd матриц) достичь точности, близкой к точности сингулярного разложения. |
28.03.2024 | Zhuo-Song Zhang (Southern University of Science and Technology) | Berry-Esseen bounds for generalized U-statistics |
We establish optimal Berry-Esseen bounds for the generalized U- statistics. The proof is based on a new Berry-Esseen theorem for exchangeable pair approach by Stein’s method under a general linearity condition setting. As applications, an optimal convergence rate of the normal approximation for subgraph counts in Erdös-Rényi graphs and graphon-random graph is obtained. |
04.04.2024 | Vadim Popov (Huawei) | Advantages of additional training of diffusion probabilistic models using the Huber loss function | Диффузионные вероятностные модели предполагают, как правило, обучение посредством минимизации средней квадратичной ошибки между некоторой величиной и её оценкой нейронной сетью. В докладе будет рассмотрено влияние функции потерь Хьюбера как робастной альтернативы средней квадратичной ошибки в контексте обучения диффузионных моделей. Будут продемонстрированы практические преимущества функции потерь Хьюбера в случае, когда диффузионная модель дообучается на данных, содержащих существенное количество шума. Кроме того, будет сформулирован теоретический результат, отчасти объясняющий наблюдаемые результаты, а также приведено его схематичное доказательство. |
16.05.2024 | Andrey Kupavsky (MIPT) | Concentration inequalities for finding multi-colored combinations | Рассмотрим k-дольный k-равномерный гиперграф на [n]^k. Несложно видеть, что любой такой гиперграф с более чем (s-1)n^{k-1} рёбрами содержит паросочетание размера s. Аарони и Бергер предложили рассмотреть разноцветный аналог этого вопроса: пусть даны s гиперграфов, у каждого из которых больше, чем (s-1)n^{k-1} рёбер. Можем ли мы гарантировать существование паросочетания, в которым i-е ребро приходит из i-го гиперграфа? В этом докладе я расскажу практически полное решение этой задачи, основанное на некотором концентрационном неравенстве о пересечении семейства и случайного паросочетания. Совместная работа с Сергеем Киселёвым. |
23.05.2024 | Konstantin Yakovlev (MIPT) | Optimal score estimation via empirical Bayes smoothing | The problem of estimating the score function of an unknown probability distribution from n independent and identically distributed observations in d dimensions is studied. Assuming that the true density is sub gaussian and has a Lipschitz-continuous score function, the optimal rate for the estimation problem under the loss function commonly used in the score matching literature was established. This reveals the curse of dimensionality where sample complexity for accurate score estimation grows exponentially with the dimension. Leveraging key insights in empirical Bayes theory as well as a new convergence rate of smoothed empirical distribution in Hellinger distance, it was shown that a regularized score estimator based on a Gaussian kernel attains this rate, shown optimal by a matching minimax lower bound. The implications of these findings for the sample complexity of score-based generative models are discussed. |
06.06.2024 | Alexander Korotin (Skoltech), Nikita Gushchin (Skoltech), Sergey Kholkin (Skoltech) | Building light Schrödinger bridges |
Schrödinger Bridges (SB) have recently gained attention of the ML community as a promising extension of classic diffusion models, which are also interconnected to the Entropic Optimal Transport (EOT). Despite the recent advances in the field of computational Schrödinger Bridges (SB), most existing SB solvers are still heavy-weighted and require complex optimization of several neural networks. We address this issue and propose two novel light solvers for this problem: LightSB and LightSBM. Both utilize the optimal structure of Schrödinger Bridges but in different ways. The LightSB solver allows direct minimized KL-divergence with the ground-truth solution, knowing only start and end marginals. The LightSBM solver is based on bridge matching and introduces the new concept of "optimal projection," allowing the Schrödinger Bridge to be solved in one bridge-matching iteration. |
13.06.2024 | Maria Rozanova (HSE University) | Low-rank matrix methods for incremental learning in recommendation systems | Инкрементальное обучение позволяет модели обучаться под новые данные, переиспользуя полученные ранее результаты. Такой подход бывает полезен в рекомендательных системах в условиях постоянного поступления новых взаимодействий пользователей и товаров, так как он помогает обновлять модель значительно быстрее, чем при полном пересчете. В то же время инкрементальные методы повышают стабильность рекомендаций и могут иметь преимущество по сравнению с классическими подходами в сервисах, где пользователи склонны формировать устойчивые предпочтения, которые не сильно меняются за короткие промежути времени. В докладе будут рассмотрены предложенный ранее метод Projector Splitting Integrator, адаптированный нами под расширенную инкрементальную задачу с добавлением новых пользователей и товаров, и новый метод Reused Projector. |
02.08.2024 | Evgeny Golikov (EPFL) | A generalisation bound for nearly linear networks | We derive a novel generalisation bound for neural networks that becomes nonvacuous when the activation functions get close to linearity. To the best of our knowledge, our bound is the first nonvacuous generalisation bound which is computable prior to training. Our analysis is based on a novel concept of "proxy models" who exploit the weights of trained models with activation functions removed. |
15.08.2024 | Nikolai Malkin (University of Edinburgh) | Amortising intractable inference with diffusion models and off-policy RL | I will present recent and ongoing work relating diffusion models, variational inference, and reinforcement learning. Efficient inference of high-dimensional variables under a diffusion model prior enables solutions to problems such as conditional generation, semantic segmentation, and combinatorial optimisation (https://arxiv.org/abs/2206.09012). A new family of amortised methods (https://arxiv.org/abs/2402.05098, https://arxiv.org/abs/2405.20971) places the problem of stochastic continuous control -- including sampling of posterior distributions under neural SDE priors -- in the theoretical framework of off-policy entropy-regularised reinforcement learning, via the formalism of continuous generative flow networks (https://arxiv.org/abs/2301.12594). This connection allows us to train diffusion models to sample a target density or energy function, i.e., to perform black-box variational inference. It further lets us fine-tune pretrained diffusion models in a data-free manner for asymptotically unbiased posterior sampling, with applications to vision (class-conditional generation, text-to-image generation), language (constrained generation in diffusion language models), and control/planning (KL-constrained policy extraction with a diffusion behaviour policy). These approaches are applicable in Bayesian machine learning and various scientific domains. |
05.09.2024 | Paul Mangold (École Polytechnique) | Taming Heterogeneity in Federated Linear Stochastic Approximation | In federated learning, multiple agents collaboratively train a machine learning model without exchanging local data. To achieve this, each agent locally updates a global model, and the updated models are periodically aggregated. In this talk, I will focus on federated linear stochastic approximation (FedLSA), with a strong focus on agents heterogeneity. I will derive upper bounds on the sample and communication complexity of FedLSA, and present a new method to reduce communication cost using control variates. Particular attention will be put on the "linear speed-up" phenomenon, showing that the sample complexity scales with the inverse of the number of agents in both methods. |
12.09.2024 | Evgenу Stepanov (POMI RAS, Universita di Pisa, HSE University) | On some spectral methods of searching for hidden geometric structures in data | Будут обсуждены некоторые методы реконструкции метрических пространств с мерой (в основном римановых многообразий с объемной мерой), основанные на спектральной теории. К таким методам относятся multidimensional scaling (MDS), Laplacian eigenmaps и их родственники. Будут представлены некоторые результаты о получаемых этими методами вложениях, а также некоторые открытые вопросы. |
19.09.2024 | Ekaterina Grishina (HSE UHSE University) and Alexandra Ivanova (HSE University, AIRI) | Part I: Accurate and effective estimation of the spectral norm of convolutional layers of neural networks Part lI: Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing |
Часть I: В докладе будет сделан обзор методов вычисления спектральной нормы свертки. Будет рассказано о новой оценке спектральной нормы сверточного слоя с помощью спектральной нормы ядра. Будет рассмотрено применение данной оценки к регуляризации сверточных нейросетей. Часть ll: Диффузионные модели активно применяются для редактирования реальных изображений. Для того, чтобы достичь желаемого редактирования обычно используеся classifier-free guidance, однако этого недостаточно, чтобы сохранить структуру исходного изображения. Guide-and-Rescale использует механизм self-guidance для сохранения частей изображения, которые не должны меняться, благодаря чему получается более стабильное редактирование. Более того Guide-and-Rescale не требует оптимизации и файнтюнинга, благодаря чему работает вычислительно эффективно. |
26.09.2024 | Hoi To Wai (Chinese University of Hong Kong, CUHK) | Stochastic Approximation Algorithms with Decision-Dependent Data: The Case of Performative Prediction |
Stochastic approximation (SA) is the working horse behind many online algorithms for decision making under uncertainty. Especially, it has sparked renewed interests in recent years in the dynamical environment setting where the streaming data is not i.i.d., but is correlated and/or decision dependent. This reignited interests are due to its use in many contemporary applications, e.g., reinforcement learning, performative prediction, fine-tuning of large language models (LLMs). This talk will concentrate on the SA algorithms applied on stochastic optimization problems with decision-dependent distribution(s), also known as the performative prediction problem(s). We will first motivate the problem using an example of strategic classification and demonstrate that a natural implementation of the "stochastic gradient" algorithm with "greedy deployment" leads to an SA scheme with non-stochastic-gradient updates. Then, we present recent results on the convergence of such algorithm under the convex and non-convex settings, as well as stateful and non-stateful agent environments. Finally, we demonstrate a few interesting extensions of the model towards the case of multi-agent learning, non-cooperative network games, fine-tuning of LLMs, etc. |
03.10.2024 | Vladimir Spokoiny (HSE University, WIAS Berlin) | Inference for nonlinear inverse problems | New approach to statistical inference is introduced and illustrated by the examples of nonlinear regression, error-in-operator problem and nonlinear elliptic PDE. |
10.10.2024 | Anna Markovich (HSE University) | Real-time detection of time series irregularities based on prediction using expert strategies | В докладе будет представлен новый метод решения задачи быстрого обнаружения изменений в вероятностном распределении данных временного ряда. Метод основан на оценке скор-функции через алгоритмы предсказания в континуальном множестве экспертов с квадратичной функцией потерь. В докладе будут представлены оценки на регрет алгоритмов, применение стратегии следования за наилучшим составным экспертом в задаче обнаружения разладки и численные эксперименты, показывающие поведение метода на искусственных и реальных временных рядах. |
17.10.2024 | Denis Rakitin (HSE University) | Regularized Distribution Matching Distillation for One-step Unpaired Image-to-image Translation | Цель методов дистилляции диффузионных моделей — их сжатие в одношаговые генераторы с сохранением настолько близкого к оригиналу качества, насколько возможно. Среди этих методов Distribution Matching Distillation (DMD) представляет собой фреймворк для обучения генераторов произвольного вида, применимых за пределами безусловной генерации. В докладе будет рассказано о нашей с коллегами недавней работе, в которой предлагается модификация метода DMD, применимая к задачам непарного переноса стиля. Мы показываем его применимость к различным задачам перевода между доменами, включая изображения, где наш метод демонстрирует сравнимое и/или лучшее качество, чем многошаговые бейзлайны. |
07.11.2024 | Evgeny Zheng (CNRS, Université Paris-Saclay) | Small Total-Cost Constraints in Contextual Bandits with Knapsacks |
I will talk about some recent developments in the literature of contextual bandit problems with knapsacks [CBwK], a problem where at each round, a scalar reward is obtained and vector-valued costs are suffered. The goal is to maximize the cumulative rewards while ensuring that the cumulative costs are lower than some predetermined cost constraints. In this setting, total cost constraints had so far to be at least of order T^{3/4} where T is the number of rounds, and were even typically assumed to depend linearly on T. Elaborating on the main technical challenge and drawback of the previous approaches, I will present a dual strategy based on projected-gradient-descent updates, that is able to deal with total-cost constraints of the order of T^{1/2} up to poly-logarithmic terms. This strategy is direct, and it relies on a careful, adaptive, tuning of the step size. The approach is inspired by a parameter-free-type algorithms arising from convex (online) optimization literature, which I also briefly review. The talk is based on joint works with C. Giraud, Z. Li, and G. Stoltz. |
14.11.2024 | Askar Tsyganov (HSE University) and Artemy Rubtsov (HSE University) | Part I: synchronous Stochastic Approximation and Q-Learning Part lI: Optimal Query Complexities for Dynamic Trace Estimation |
Часть ll: В докладе мы рассмотрим задачу обучения с покреплением без доступа к генеративной модели через схему ассинхронной стохастической аппроксмиации и получим оценки на скорость сходимости конечных траекторий алгоритма Q-learning, а также поговорим о некоторых его модификациях, позволяющих ускорить сходимость к оптимальному решению. |
28.11.2024 | Yazid Janati (École Polytechnique) | Variational Diffusion Posterior Sampling with Midpoint Guidance | Diffusion models have recently shown considerable potential in solving Bayesian inverse problems when used as priors. However, sampling from the resulting denoising posterior distributions remains a challenge as it involves intractable terms. To tackle this issue, state-of-the-art approaches formulate the problem as that of sampling from a surrogate diffusion model targeting the posterior and decompose its scores into two terms: the prior score and an intractable guidance term. While the former is replaced by the pre-trained score of the considered diffusion model, the guidance term has to be estimated. In this paper, we propose a novel approach that utilises a decomposition of the transitions which, in contrast to previous methods, allows a trade-off between the complexity of the intractable guidance term and that of the prior transitions. We also show how the proposed algorithm can be extended to handle the sampling of arbitrary unnormalised densities. We validate the proposed approach through extensive experiments on linear and nonlinear inverse problems, including challenging cases with latent diffusion models as priors |
Fall 2023
Date | Speaker | Title | Abstract |
14.09.2023 |
Vladimir Spokoiny (HSE University, IITP RAS, Humboldt University and WIAS Berlin) |
Deviation bounds for the norm of a random vector under exponential moment conditions with applications |
Hanson-Wright inequality provides a powerful tool for bounding the norm \| \xi \| of a centered stochastic vector \( \xi \) with sub-gaussian behavior. This paper extends the bounds to the case when \( \xi \) only has bounded exponential moments of the form \( \log E \exp \langle V^{-1} \xi,u \rangle \leq \| u \|^{2}/2 \), where \( V^{2} \geq \mathrm{Var}(\xi) \) and \( \| u \| \leq g \) for some fixed \( g \). For a linear mapping \( Q \), we present an upper quantile function \( z_{c}(B,x) \) ensuring \( P(\| Q \xi \| > z_{c}(B,x)) \leq 3 e^{-x} \) with \( B = Q \, V^{2} Q^{T} \). The obtained results exhibit a phase transition effect: with a value \( x_{c} \) depending on \( g \) and \( B \), for \( x \leq x_{c} \), the function \( z_{c}(B,x) \) replicates the case of a Gaussian vector \( \xi \), that is, \( z_{c}^{2}(B,x) = \tr(B) + 2 \sqrt{x \tr(B^{2})} + 2 x \| B \| \). For \( x > x_{c} \), the function \( z_{c}(B,x) \) grows linearly in \( x \). The results are specified to the case of Bernoulli vector sums and to covariance estimation in Frobenius norm. |
21.09.2023 |
Arthur Goldman (HSE University) | RL with friends: how many agents does it take to solve an environment? | As part of the report, students will get acquainted with the main definitions and results of the field related to the study multi-agent reinforcement learning и mean-field games. |
05.10.2023 | Nikita Puchkin (HSE University) | Sharper dimension-free bounds on the Frobenius distance between sample covariance and its expectation | We consider the problem of covariance estimation from a sample of high-dimensional random vectors. We derive dimension-free bounds on the squared Frobenius distance between sample covariance and its expectation, which significantly improve over the existing results. |
12.10.2023 | Fedor Noskov (HSE University, MIPT, Skoltech) | Analysis of Boolean functions | На семинаре мы обсудим общие техники булевого анализа, а также следствия, к которым эти техники приводят в теории обучения, в теории аппроксимации и в комбинаторике. Основным предметом нашего обсуждения станет анализ Фурье булевых функций. Также, мы немного коснемся феномена hypercontractivity и его различных приложений. |
19.10.2023 | Denis Rakitin (HSE University) | Generative models based on ODE/DE | В связи с огромным успехом диффузионных моделей в решении различных задач генерации в последнее время начало появляться все больше моделей, работающих в динамике. Один из наиболее удобных и выразительных фреймворков для этого — обыкновенные и стохастические дифференциальные уравнения (в терминах которых, в том числе, можно формализовать и диффузионные модели). На семинаре мы обсудим базовую здесь модель Flow Matching и ее обобщения, позволяющие решать задачу генерации, парные задачи (в том числе, обратные, например, повышение разрешения или деблюринг) и непарный перенос стиля, сформулированный в терминах оптимального транспорта. |
09.11.2023 | Evgeny Frolov (HSE University, AIRI) | Learning from sequences using tensor decompositions | Доклад посвящен применению тензорных разложений и эффективных вычислительных методов линейной алгебры для создания аналога модели внимания (self-attention) на последовательностях пользовательских действий. Такой подход может быть применен как в стандартной задаче предсказания следующих действий пользователей, так и в более широком круге задач, связанных с анализом неполных многомерных временных рядов. |
14.12.2023 | Alexander Tarakanov (HSE University, Odnoklassniki) | Application of machine learning methods for computing eigenfunctions of differential operators |
Оператор Лапласа на графе регулярно используется в задачах машинного обучения, например, для кластеризации или при обучении с частичным привлечением учителя. Применения к задачам машинного обучения базируются на собственных функциях оператора Лапласа на графе, который строится по датасету. Данные функции тесно связаны с геометрией многообразия, на котором лежит датасет: в пределе собственные функции на графе стремятся к собственным функциям оператора Лапласа-Бельтрами на многообразии. В этом причина высокой эффективности графовых методов в задачах кластеризации и задача с частичным привлечением учителя. У графовых методов есть множество преимуществ, но есть и существенные недостатки. Во-первых, время вычисления собственных функций оператора Лапласа на графе быстро растет с размером датасета. Во-вторых, для предсказаний в точке, которой не было в обучающей выборке, требуется заново строить граф, вычислять собственные функции и решать задачу регрессии, классификации, кластеризации и т.д. В идеале хотелось бы вычислять значения собственных функций в новых точках без дополнительного обучения. В данной работе решается эта проблема. А именно, предложен алгоритм аппроксимации собственных функций похожего дифференциального оператора (энергии Дирихле) при помощи нейронных сетей. В новом алгоритме собственные функции вычисляются последовательно путем минимизация энергии Дирихле при условиях ортогональности уже аппроксимированным функциям и нормировки. Таким образом, задача сводится к минимизации трех величин: величина проекции заданной функции на подпространство, порожденное уже построенными функциями, отклонение нормы от единицы и энергии Дирихле. Один из способов решения такой задачи — минимизация целевой функции, являющейся взвешенной суммой трех данных величин. В работе предлагается новый алгоритм построения целевой функции. А именно, веса подбираются таким образом, чтобы нейросеть последовательно добивалась выполнения каждого из условий: сначала бы достигалось ортогональность, потом условие нормировки при выполнении условия ортогональности для устойчивости алгоритма, и только потом минимизируется энергия Дирихле при выполнении условий ортогональности и нормировки. Помимо этого, предложен способ регуляризации, который позволяет применять метод для небольших датасетов, и повышает устойчивость алгоритма вычисления собственных функций. Рассматриваются примеры применения нового метода к синтетическим и реальным датасетам. |
Date | Speaker | Title | Abstract |
17.01.2023 |
Alexandra Senderovich (HSE University) |
Norm and trace estimation with random rank-one vectors (paper overview) |
To estimate the norm of an arbitrary matrix or the trace of a symmetric nonnegatively defined matrix, several multiplications of the matrix by random vectors are often enough. Using rank 1 vectors, i.e. Kronecker products of smaller random vectors, it is possible to speed up the calculation of such an estimate by speeding up sampling and matrix multiplication. The article shows that with random vectors of rank 1 sampled from a normal distribution or a Rademacher distribution, the probabilities of getting good estimates for the norm deteriorate slightly compared to using ordinary vectors from the same distributions. The authors also conducted numerical experiments confirming the derived probabilities and showing the advantages of the approach with rank 1 vectors. |
31.01.2023 | Dmitry Lishudi (HSE University) | "How benign is benign overfitting?" | On many datasets, neural networks when trained with SGD achieve almost zero error on training data and good generalizability on test data, even when the training partitioning is noisy. This effect is called benign overfitting. Nevertheless, such models are vulnerable to adversarial attacks. The first reason for this vulnerability is poor sampling. Thus, theoretical and empirical analysis shows that noise in data partitioning is one of the causes of adversarial vulnerability and robust models fail to achieve zero training error. However, removing incorrect labels does not achieve resilience to adversarial attacks. The paper suggests that it is also a matter of suboptimal representation learning. Using a simple theoretical statement, it is shown that the choice of representations can significantly affect adversarial stability. |
07.02.2023 - 12.02.2023 |
|
Зимняя школа "Математика машинного обучения" |
https://cs.hse.ru/iai/hdilab/mml2023/ |
14.02.2023 |
Nikita Puchkin (HSE University) |
Exploring Local Norms in Exp-concave Statistical Learning |
We consider the standard problem of stochastic convex optimization with exp-concave losses using Empirical Risk Minimization in a convex class. Answering a question raised in several prior works, we provide a $O ( d/n + 1/n log ( 1/ \delta ) )$ excess risk bound valid for a wide class of bounded exp-concave losses, where d is the dimension of the convex reference set, n is the sample size, and $\delta$ is the confidence level. Our result is based on a unified geometric assumption on the gradient of losses and the notion of local norms. |
21.02.2023 |
Denis Belomestny (HSE University, Duisburg Essen University) |
A Reproducing Kernel Hilbert Space approach to singular local stochastic volatility McKean-Vlasov models |
Motivated by the challenges related to the calibration of financial models, we consider the problem of numerically solving a singular McKean-Vlasov equation. |
28.02.2023 |
Sergey Samsonov (HSE University) |
Finite-time high-probability bounds for linear stochastic approximation |
We discuss a finite-time analysis of linear stochastic approximation (LSA) algorithms, a core method in statistics and machine learning. LSA is used to compute approximate solutions of a d-dimensional linear system, where both the system matrix and the right-hand side can only be estimated through (asymptotically) unbiased observations. We consider here the case where the observations are driven either by sequence of i.i.d. noise variables or by uniformly geometrically ergodic Markov chain. We discuss the p-moments inequality and high probability bounds for the LSA iterates and its Polyak-Ruppert averaged version focusing on tight dependence upon the mixing time of the chain. We then prove finite-time instance-dependent bounds on the Polyak-Ruppert averaged sequence of iterates. These results are sharp in the sense that the leading term we obtain matches the local asymptotic minimax limit. We also cover the applications to the temporal-difference methods in RL. |
07.03.2023 | Maxim Vasiliev (HSE University) |
Approximation and sampling of multivariate probability distributions in the tensor train decomposition (papers overview) |
В докладе обсуждается сэмплирование произвольных абсолютно непрерывных случайных величин, плотность которых может быть приближена тензорным поездом. Это позволяет построить MCMC алгоритм, сложность которого возрастает линейно с увеличением размерности. Далее расскажем про модификации этого алгоритма, основанные на приближении из плотности и на применении композиции приближений плотностей. Обсудим теоретические гарантии сходимости алгоритмов, а также применение алгоритмов для различных ситуаций: оценка вероятностей редких событий, сэмплирование методом главной выборки, оценка дифференциальных уравнений. |
14.03.2023 | Fedor Noskov (HSE University, MIPT, Skoltech) |
Optimal Estimation in Mixed-Membership Stochastic Block Model |
Community detection is one of the central problems in modern network science. It has numerous applications in the analysis of social and biological networks, designing network protocols and many other areas. Recently, much attention has been paid to the detection of overlapping communities, where each node in a network may belong to multiple groups. We consider the parameter estimation problem in Mixed Membership Stochastic Block Model (MMSB), which is a quite general instance of a random graph model allowing for overlapping community structure. We discuss different approaches to parameter estimation in this model, their theoretical properties and practical performance. Finally, we propose an approach to improve over existing estimates that allows obtaining minimax optimal rates of estimation for all the model parameters. |
23.03.2023 | Vladimir Spokoiny (HSE University, IITP RAS, Humboldt University and WIAS Berlin) |
Inference in Error-in-Operator model |
We consider the problem of recovering the source signal \( x \) from the noise observation \( Y \) given by the equation \( Y = A x + \epsilon \) in the situation when the operator \( A \) is not precisely known. Instead, a pilot estimate \( \hat{A} \} is available. New approach is suggested which allows to obtain finite sample nearly sharp accuracy guarantees and incorporate the smoothness of the signal \( x \) and the operator \( A \) in a rather general form. |
28.03.2023 | Evgeny Lagutin (HSE University) |
Stable Rank Normalization for Improved Generalization in Neural Networks and GANs (paper overview) |
Exciting new work on generalization bounds for neural networks (NN) given by Bartlett et al. (2017); Neyshabur et al. (2018) closely depend on two parameter- dependent quantities: the Lipschitz constant upper bound and the stable rank (a softer version of rank). Even though these bounds typically have minimal practical utility, they facilitate questions on whether controlling such quantities together could improve the generalization behaviour of NNs in practice. Spectral Normalization (Miyato et al., 2018) is a technique making it easy to restrict the Lipschitz constant upper bound and thus is a popular way to stabilize training of generative adversarial networks and is to make a general discriminative neural network robust. On the other hand Spectral Rank Normalization (Sanyal et al., 2020) tackles the problem of restricting a stable rank of neural network's layers. Motivated by the generalization bound view this technique in combination with Spectral Normalization further improve the robustness properties on classification tasks and the quality on image generation tasks. |
04.04.2023 | Evgeny Nikishin (Mila, University of Montreal) | Deep Reinforcement Learning with Plasticity Injection | A growing body of evidence suggests that neural networks employed in deep reinforcement learning (RL) gradually lose their plasticity, the ability to learn from new data; however, the analysis and mitigation of this phenomenon is hampered by the complex relationship between plasticity, exploration, and performance in RL. We introduce plasticity injection, a minimalistic intervention that increases the network plasticity without changing the number of trainable parameters or biasing the predictions. The applications of this intervention are two-fold: first, as a diagnostic tool — if injection increases the performance, we may conclude that an agent's network was losing its plasticity. This tool allows us to identify a subset of Atari environments where the lack of plasticity causes performance plateaus, motivating future studies on understanding and combating plasticity loss. Second, plasticity injection can be used to improve the computational efficiency of RL training if the agent exhausted its plasticity and has to re-learn from scratch or by growing the agent's network dynamically without compromising performance. The results on Atari show that plasticity injection attains stronger performance compared to alternative methods while being computationally efficient. |
20.04.2023 | Daniil Tyapkin (HSE University) | Fast Rates for Maximum Entropy Exploration | We consider the reinforcement learning (RL) setting, in which the agent has to act in unknown environment driven by a Markov Decision Process (MDP) with sparse or even reward free signals. In this situation, exploration becomes the main challenge. In this work, we study the maximum entropy exploration problem of two different types. The first type is visitation entropy maximization that was previously considered by Hazan et al. 2019 in the discounted setting. For this type of exploration, we propose an algorithm based on a game theoretic representation that has O(HSA/eps^2) sample complexity thus improving the eps- and S-dependence of Hazan, where S is a number of states, A is a number of actions, H is an episode length, and eps is a desired accuracy. The second type of entropy we study is the trajectory entropy. This objective function is closely related to the entropy-regularized MDPs, and we propose a simple modification of the UCBVI algorithm that has a sample complexity of order 1/eps ignoring dependence in S,A,H. Interestingly enough, it is the first theoretical result in RL literature establishing that the exploration problem for the regularized MDPs can be statistically strictly easier (in terms of sample complexity) than for the ordinary MDPs. |
27.04.2023 | Arthur Goldman (HSE University) | Theoretical guarantees for neural control variates in MCMC | In this paper, we propose a variance reduction approach for Markov chains based on additive control variates and the minimization of an appropriate estimate for the asymptotic variance. We focus on the particular case when control variates are represented as deep neural networks. We derive the optimal convergence rate of the asymptotic variance under various ergodicity assumptions on the underlying Markov chain. The proposed approach relies upon recent results on the stochastic errors of variance reduction algorithms and function approximation theory. |
01.05.2023 | Vladimir Spokoiny (HSE University, IITP RAS, Humboldt University and WIAS Berlin) | Sharp deviation bounds for quadratic forms and their use in machine learning | Under a mild assumption of linearity on the log-likelihood function, the tools of empirical processes are not necessary, the analysis of a general MLE can be reduced to a deviation bound for a quadratic form. This paper explains how the recent advances in Laplace approximation from [Spokoiny, 2022, 2023] can be used for obtaining sharp Gaussian-like finite sample bounds and for stating the prominent concentration phenomenon for the squared norm of a sub-gaussian vector. Some extensions and open problems will be discussed as well. |
04.05.2023 | Yana Hassan (HSE University) | Fast Rates for Maximum Entropy Exploration | Nowadays it is becoming extremely popular to tune the large pertained language model with experts comments using reinforcement learning with human feedback methodology. In my talk I would focus on a theoretical framework for Reinforcement Learning with Human Feedback (RLHF). The analysis from the paper shows that when the true reward function is linear, the widely used maximum likelihood estimator (MLE) converges under both the Bradley-Terry-Luce (BTL) model and the Plackett-Luce (PL) model. The results validate the empirical success of existing RLHF algorithms in InstructGPT and provide new insights for algorithm design. In my talk I am going to dicsuss the unified analysis the problem of RLHF and max-entropy Inverse Reinforcement Learning (IRL), and illustrate the first sample complexity bound for max-entropy IRL. |
11.05.2023 | Alexander Podkopaev (Carnegie Mellon University) | Nonparametric two-sample and independence testing by betting | Two-sample and independence testing are two fundamental problems of statistical inference. In the former, given observations from two distributions, the goal is to test the null hypothesis that the distributions are equal against the alternative that they are not. In the latter, given observations drawn from an unknown joint distribution, the goal is to test the null that the underlying random variables are independent against the alternative that they are not. Both problems have been extensively studied in the batch setting when the sample size is specified before collecting data. The main issue is that in general composite nonparametric settings, even if the null is false, it is unknown beforehand collecting how much data will be ''enough'' to reject. Sequential testing is a complementary approach when an incoming data stream is analyzed online. In this talk, I will discuss our recent sequential two-sample and independence tests that (a) continuously monitor the data while controlling the false alarm rate, (b) are consistent, meaning that they are guaranteed to stop and reject if the null is false, and (c) provably adapt to the complexity of a problem and stop earlier on easy tasks / later on harder ones. |
18.05.2023 | Ulyana Vinogradova (HSE University) | Leveraging covariate adjustments at scale in online A/B testing | Various companies run randomized online experiments to estimate the “causal impact” of new adopted features on performance of metrics. Difference in means estimator is still one of the most commonly used method in many online A/B testing platforms due to its simplicity. However, this estimator often fails to detect small effects of the intervention despite the fact that the experiment contains thousands or millions of customers. A potential solution to this issue is to develop an alternative estimator, that leverages the abundance of additional data (covariates) collected during the experiments. In this talk, I will discuss recently proposed general-purpose algorithm for the estimation of causal effects with covariates to the setting of online A/B testing. Also I will talk about benefits, costs and risks of allowing experimenters to leverage more complicated estimators that use covariates to estimate causal effects. |
25.05.2023 | Ivan Safonov (HSE University) | Tensorization of flows: a tool for variational inference |
Consider the classical problem of learning the approximation from a given class of distributions to a given distribution up to a normalizing constant. To solve this problem, you can apply flow normalization. The unimodality of the reference Gaussian distribution used in the normalizing flow may cause difficulties in studying multimodal distributions. We introduce an extension of normalizing flows, in which the Gaussian reference value is replaced by a reference distribution (tensor flow), which is constructed using a tensor network. TF combined with NF gives better results than each of them individually. In this talk, I will talk about the recently proposed tensorizing flow, how to extract a sample from it, how to study it, and how to apply it to VI together with NF. The tensor flow construction is similar to the construction described in Sergey Dolgov's articles, but it has some interesting differences that we will discuss. |
08.06.2023 | Ivan Peshekhonov and Alexey Arzhantsev (HSE University) | Exploring the Benefits of Riemannian Optimization on Shared Factor Tensor Manifolds | Factorization of a matrix into a product of two rectangular ones, called factors, is a classic tool in various machine learning applications. Tensor factorizations generalize this concept to more than two dimensions. In certain applications, where some of the tensor dimensions encode the same objects (e.g., knowledge base graphs with entity-relation-entity 3D tensors), it may also be beneficial for the respective factors to be shared. In this paper, we consider a well-known Tucker tensor factorization and study its properties under the shared factor constraint. We call it a shared-factor Tucker factorization (SF-Tucker). We prove that a manifold of fixed rank SF-Tucker tensors forms a smooth manifold and develop efficient algorithms for the main ingredients of Riemannian optimization: projection to a tangent plane for general-purpose loss functions and retraction. Having these tools at hand, we also implement a Riemannian optimization method with momentum and showcase the benefits of our approach on several machine learning tasks. |
15.06.2023 |
Irina Goloborodko (HSE University) |
Jacobian regularization of tensor decompositions in neural networks | Tensor decompositions can be used as a compression tool for neural networks. For this purpose the tensor of weights of a full-connected or convolutional layer is replaced in the architecture by its own tensor decomposition. The disadvantage of this approach is the instability of the compressed neural network for some types tensor decompositions. In order to improve the stability of the compressed neural network, randomized Jacobian spectral regularizer can be applied. The regularizer relies on matvec functions of the Jacobian and its transposed and can be efficiently computed using automatic differentiation tools. This results in great versatility of this method and makes it possible to apply proposed technique in different scenarios. |
29.06.2023 | Nikita Morozov (HSE University) | Введение в Generative Flow Networks (GFlowNets) | GFlowNets — класс генеративных моделей, обучаемых генерировать объекты через последовательность конструктивных действий с вероятностью, пропорциональной заданной функции награды R(x). Одной из ключевых особенностей таких моделей является возможность эффективного семплирования из различных мод как дискретных так и непрерывных распределений. В данном докладе будет рассмотрена математическая конструкция, лежащая в основе таких моделей, а также набор стандартных подходов к их обучению. |
Date | Speaker | Title | Abstract |
19.09.2022 |
Denis Belomestny (HSE University, Duisburg Essen University) |
Statistics of McKean-Vlasov equations |
In this talk we study the problem of semiparametric estimation for a class of McKean-Vlasov stochastic differential equations. Our aim is to estimate the drift coefficient of a MV-SDE based on observations of the corresponding particle system. We propose a semiparametric estimation procedure and derive the rates of convergence for the resulting estimator. We further prove that the obtained rates are essentially optimal in the minimax sense. |
26.09.2022 |
Alexander Tarakanov (HSE University) |
Multi-Level Unbiased MCMC Estimators for Discretized Models |
ODE and PDE models routinely in different applications. Significant part of practical questions can be reduced to integration over the distribution of ODE/ODE parameters spaces. The practical way for solving such problem is MCMC integration coupled with spatial/temporal discretization of ODE/PDE model. Such discretization introduces bias to MCMC integration. In the present talk the novel scheme for Multi-Level Unbiased MCMC method is discussed. The technique concerned is provides unbiased estimator with finite variance and finite expected computational cost for a given Quantity of Interest under reasonable smoothness constraints. Application examples are provided. |
03.10.2022 |
Yana Khassan (HSE University) |
Is Q-learning Provably Efficient? (Article overview) |
Model-free reinforcement learning (RL) algorithms, such as Q-learning, directly parameterize and update value functions or policies without explicitly modeling the environment. They are typically simpler, more flexible to use, and thus more prevalent in modern deep RL than model-based approaches. However, empirical work has suggested that model-free algorithms may require more samples to learn. The theoretical question of “whether model-free algorithms can be made sample efficient” is one of the most fundamental questions in RL, and remains unsolved even in the basic scenario with finitely many states and actions. The paper proves that, in an episodic MDP setting, Q-learning with UCB exploration achieves almost optimal regret. This is the first analysis in the model-free setting that establishes O(√T) regret without requiring access to a “simulator.” |
10.10.2022 |
Дмитрий Молчанов (HSE University) |
Дважды стохастический вариационный вывод с полунеявными и несобственными распределениями |
Дважды стохастический вариационный вывод (DSVI) является одним из основных методов приближенного вывода и обучения в современных глубоких вероятностных моделях. Однако, этот метод накладывает существенные ограничения на используемые распределения и имеет ограниченную область применимости. Основной целью моих работ является ослабление этих ограничений и расширение инструментария для работы с глубокими вероятностными моделями. На примере модели вариационного дропаута мы показываем, как снятие ограничений с вариационных параметров позволяет открыть новые режимы работы модели - режим разреживания и режим "дисперсионной сети". Затем, мы обобщаем DSVI на случай моделей с т.н. полу-неявными распределениями. Предложенный подход, Doubly Semi-Implicit Variational Inference, позволяет делать вывод в новых классах моделей, уточнять вывод в существующих моделях, уточнять оценку маргинального правдоподобия, и ускорять вывод в моделях c распределениями вида смеси. |
24.10.2022 |
Evgenii Nikishin (Mila, Université de Montréal) |
The Primacy Bias in Deep Reinforcement Learning |
This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore useful evidence encountered later. Because of training on progressively growing datasets, deep RL agents incur a risk of overfitting to earlier experiences, negatively affecting the rest of the learning process. Inspired by cognitive science, we refer to this effect as the primacy bias. Through a series of experiments, we dissect the algorithmic aspects of deep RL that exacerbate this bias. We then propose a simple yet generally-applicable mechanism that tackles the primacy bias by periodically resetting a part of the agent. We apply this mechanism to algorithms in both discrete (Atari 100k) and continuous action (DeepMind Control Suite) domains, consistently improving their performance. |
31.10.2022 |
Valeriia Shcherbakova (HSE University) |
A Contrastive Approach to Online Change Point Detection |
We suggest a novel procedure for online change point detection. Our approach expands an idea of maximizing a discrepancy measure between points from pre-change and post-change distributions. This leads to a flexible procedure suitable for both parametric and nonparametric scenarios. We prove non-asymptotic bounds on the average running length of the procedure and its expected detection delay. The efficiency of the algorithm is illustrated with numerical experiments on synthetic and real-world data sets. |
1.11.2022-3.11.2022 |
Autumn School |
Fall into ML 2022 |
|
07.11.2022 |
Artur Goldman (HSE University) |
Theoretical guarantees for approximate sampling from smooth and log-concave densities |
Sampling from various kinds of distributions is an issue of paramount importance in statistics. In many situations, the exact sampling from a given distribution is impossible or computationally expensive and, therefore, one needs to resort to approximate sampling strategies. However, there is no well-developed theory providing meaningful nonasymptotic guarantees for the approximate sampling procedures, especially in the high-dimensional problems. This paper makes some progress in this direction by considering the problem of sampling from a distribution having a smooth and log-concave density. Authors establish nonasymptotic bounds for the error of approximating the target distribution by the one obtained by the Langevin Monte Carlo method and its variants, and illustrate the effectiveness of the established guarantees with various experiments. |
14.11.2022 |
Daria Demidova (HSE University) |
Rapid Convergence of the Unadjusted Langevin Algorithm: Isoperimetry Suffices |
This paper studies the Unadjusted Langevin Algorithm (ULA) for sampling from a probability distribution V = e-f on ℝn. Authors prove a convergence guarantee in Kullback-Leibler (KL) divergence assuming that V satisfies a log-Sobolev inequality and the Hessian of f is bounded. Notably, convexity or bounds on higher derivatives are not assumed. Authors also prove convergence guarantees in Rényi divergence of order q > 1 assuming the limit of ULA satisfies either the log-Sobolev or Poincaré inequality. Authors also prove a bound on the bias of the limiting distribution of ULA assuming third-order smoothness of f, without requiring isoperimetry. |
21.11.2022 and 28.11.2022 |
Anton Zubekhin (HSE University) |
Diffusion Probabilistic Models explosion |
Diffusion probabilistic models are state-of-the-art method for image generation. More than that, diffusion models also perform good working with data of different modalities, such as speech and text. Text-to-image diffusions are the most well-known types of such models with Dalle-2, Imagen and Stable Diffusion being famous even outside of ML community. Recent advances in machine learning show even wider applicability of diffusion models - they can be used for video generation, 3D reconstructions, meta-learning and even direct optimization of other neural networks. |
05.12.2022 |
Varvara Rudenko (HSE University) |
Log-concave sampling: Metropolis-Hastings algorithms are fast |
The authors study the problem of sampling from a strongly log-concave density supported on R^d, and prove a non-asymptotic upper bound on the mixing time of the Metropolis-adjusted Langevin algorithm (MALA). The method draws samples by simulating a Markov chain obtained from the discretization of an appropriate Langevin diffusion, combined with an accept-reject step. Relative to known guarantees for the unadjusted Langevin algorithm (ULA), their bounds show that the use of an accept-reject step in MALA leads to an exponentially improved dependence on the error-tolerance. And also demonstrate the gains of a modified version of MALA over ULA for weakly log-concave densities. Furthermore, they derive mixing time bounds for the Metropolized random walk (MRW) and obtain O(κ) mixing time slower than MALA. They provide numerical examples that support their theoretical findings, and demonstrate the benefits of Metropolis-Hastings adjustment for Langevin- type sampling algorithms. |
12.12.2022 |
Marina Sheshukova (HSE University) |
Exponential Concentration Inequalities for Additive Functionals of Markov Chains |
Using the renewal approach we prove exponential inequalities for additive functionals and empirical processes of ergodic Markov chains, thus obtaining counterparts of inequalities for sums of independent random variables. The inequalities do not require functions of the chain to be bounded and moreover all the involved constants are given by explicit formulas whenever the usual drift condition holds, which may be of interest in practical applications e.g. to MCMC algorithms. |
19.12.2022 |
Timofey Grinev (HSE University) |
Sharper Bounds for Uniformly Stable Algorithms |
Deriving generalization bounds for stable algorithms is a classical question in learning theory. This paper is devoted to two questions: firstly it provides a short proof of the moment bound that implies the generalization bound stronger than recent results. Secondly, it proves general lower bounds, showing that this moment bound is sharp (up to a logarithmic factor) unless some additional properties of the corresponding random variables are used. The main probabilistic result is a general concentration inequality for weakly correlated random variables, which may be of independent interest. |
Date | Speaker | Title | Abstract |
13.01.2021 |
Vladimir Spokoiny (WIAS, IITP RAS, HSE University) |
Bayesian inference in machine learning |
Motivated by examples from nonparametric binary classification, ranking problems, analysis of random graphs, we consider the problem of statistical inference for binary data within a nonparametric Bernoulli model. In many situations the underlying Bernoulli parameter may approach the edges zero or one that leads to inference for non-regular models. Properties like asymptotic normality of the MLE or of the posterior distribution are not valid in such irregular cases. The approach of this paper suggests to impose a Gaussian prior on the parameter of logistic regression function, or, analogously, to consider a penalized MLE of the canonical link with a quadratic penalty. A Gaussian prior can be viewed as a kind of a barrier preventing the canonical parameter to approach infinity and forcing it to stay within the region of regularity. Our main results state finite sample analogs of the classical statistical results like asymptotic normality of the posterior distribution or asymptotic normality of the penalized MLE for any configuration of Bernoulli parameters under the assumption that the prior is strong enough in direction. We also demonstrate how the results can be applied to ranking problems and nonparametric binary classification. |
09.02.2021 |
Nikita Puchkin (HSE University, IITP) |
Rates of convergence for density estimation with GANs |
We undertake a precise study of the non-asymptotic properties of vanilla generative adversarial networks (GANs) and derive theoretical guarantees in the problem of estimating an unknown $d$-dimensional density $p$ under a proper choice of the class of generators and discriminators. We prove that the resulting density estimate converges to $p$ in terms of Jensen-Shannon (JS) divergence at the rate $n^{-2\beta/(2\beta+d)}$ where $n$ is the sample size and $\beta$ determines the smoothness of $p$. This is the first result in the literature on density estimation using vanilla GANs with JS rates faster than $n^{-1/2}$ in the regime $\beta>d/2$. |
16.02.2021 |
Sergey Samsonov (HSE University) |
On the Stability of Random Matrix Product with Markovian Noise: Application to Linear Stochastic Approximation and TD Learning |
This paper studies the exponential stability of random matrix products driven by a general (possibly unbounded) state space Markov chain. It is a cornerstone in the analysis of stochastic algorithms in machine learning (e.g. for parameter tracking in online-learning or reinforcement learning). The existing results impose strong conditions such as uniform boundedness of the matrix-valued functions and uniform ergodicity of the Markov chains. Our main contribution is an exponential stability result for the p-th moment of random matrix product, provided that (i) the underlying Markov chain satisfies a super-Lyapunov drift condition, (ii) the growth of the matrix-valued functions is controlled by an appropriately defined function (related to the drift condition). Using this result, we give finite-time p-th moment bounds for constant and decreasing stepsize linear stochastic approximation schemes with Markovian noise on general state space. We illustrate these findings for linear value-function estimation in reinforcement learning. We provide finite-time p-th moment bound for various members of temporal difference (TD) family of algorithms. |
02.03.2021 |
Dmitry Yarotsky (Skoltech) |
Approximation with deep neural networks |
In recent years, there has been quite a number of new results on approximation properties of deep neural networks. This emergent theory is rather different from the classical approximation theory of linear or shallow models (such as splines or Fourier expansion). In this talk, I will try to highlight some interesting ideas of this theory, and will also briefly indicate its connections to other topics such as information theory, dynamical systems, fewnomials and VC dimension. |
09.03.2021 |
Nikita Puchkin (HSE, IITP) |
Exponential Savings in Agnostic Active Learning through Abstention |
We show that in pool-based active classification without assumptions on the underlying distribution, if the learner is given the power to abstain from some predictions by paying the price marginally smaller than the average loss 1/2 of a random guess, exponential savings in the number of label requests are possible whenever they are possible in the corresponding realizable problem. We extend this result to provide a necessary and sufficient condition for exponential savings in pool-based active classification under the model misspecification. |
16.03.2021 |
Nikita Zhivotovskiy (ETH, Zurich) |
Distribution-Free Robust Linear Regression |
We consider the problem of random-design linear regression, in a distribution-free setting where no assumption is made on the distribution of the predictive/input variables. After surveying existing approaches and indicating some improvements, we explain why they fall short in our setting. We then identify the minimal assumption on the target/output under which guarantees are possible, and describe a nonlinear prediction procedure achieving the optimal error bound with high probability. Joint work with Jaouad Mourtada (CREST-ENSAE Paris ) and Tomas Vaškevičius (Oxford). |
23.03.2021 |
Maxim Panov (Skoltech) |
Stochastic normalizing flows |
Normalizing flows is one of the central concepts in modern Bayesian machine learning. They allow constructing approximations of complex distributions starting from simple ones (e.g., Gaussian) and transforming it by a series of deterministic mappings. Normalizing flows have found numerous applications being especially useful in variational approach to Bayesian inferences. In this talk, we will show that classical deterministic normalizing flows have severe limitations which can be overcome by adding stochasticity to the procedure. It can significantly improve the power of the method but requires much more complicated training procedures. We will discuss how one can construct variational learning procedures based on the coupling of normalizing flows with the Metropolis-Hastings accept/reject algorithm. We will also see numerous connections with other classical sampling approaches such as Annealed Importance Sampling (AIS) and Sequential Importance Sampling (SIS). We will illustrate the theory with applications to sampling and the construction of variational autoencoders.(Joint work with Eric Moulines and Achille Thin (Ecole Polytechnique), Alain Durmus (ENS Paris-Saclay), Arnaud Doucet (Uni Oxford) and Nikita Kotelevskii (Skoltech)). |
30.03.2021 |
Dmytro Perekrestenko (ETH Zurich) |
Constructive Universal High-Dimensional Distribution Generation through Deep ReLU |
We present an explicit deep neural network construction that transforms uniformly distributed one-dimensional noise into an arbitrarily close approximation of any two-dimensional Lipschitz-continuous target distribution. The key ingredient of our design is a generalization of the “space-filling” property of sawtooth functions discovered in (Bailey & Telgarsky, 2018). We elicit the importance of depth—in our neural network construction—in driving the Wasserstein distance between the target distribution and the approximation realized by the network to zero. We also will outline an extension of our method to output distributions of arbitrary dimension. Finally, we show that the proposed construction does not incur a cost—in terms of error measured in Wasserstein-distance—relative to generating d-dimensional target distributions from d independent random variables. |
06.04.2021 |
Daniil Tiapkin (HSE University) |
Improved Complexity Bounds in the Wasserstein Barycenter Problem |
In this paper, we focus on computational aspects of the Wasserstein barycenter problem. We provide two algorithms to compute Wasserstein barycenter of m discrete measures of size n with accuracy $\e$. The first algorithm, based on mirror prox with some specific norm, meets the complexity of celebrated accelerated iterative Bregman projections (IBP), that is $\widetilde O(mn^2\sqrt n/\e)$, however, with no limitations unlike (accelerated) IBP, that is numerically unstable when regularization parameter is small. The second algorithm, based on area-convexity and dual extrapolation, improves the previously best-known convergence rates for Wasserstein barycenter problem enjoying $\widetilde O(mn^2/\e)$ complexity. |
13.04.2021 |
Alexey Kroshnin (IITP RAS, HSE University) |
Robust k-means clustering in metric spaces |
In this talk we consider robust median of means based algorithms for the k-means clustering problem in a metric space. The main results are non-asymptotic excess distortion bounds that hold under the two bounded moments assumption in a general separable metric space. In the case of Hilbert space our bounds have the sub-Gaussian form depending on the probability mass of the lightest cluster of an optimal quantizer. In a special case of clustering in R^d we prove matching (up to constant factors) non-asymptotic upper and lower bounds on the excess distortion, which depend on the probability mass of the lightest cluster and on the variance. The talk is based on a joint work with Yegor Klochkov and Nikita Zhivotovskiy |
20.04.2021 |
Quentin Paris (HSE University) |
Online learning with exponential weights in metric spaces |
This paper addresses the problem of online learning in metric spaces using exponential weights. We extend the analysis of the exponentially weighted average forecaster, traditionally studied in a Euclidean settings, to a more abstract framework. Our results rely on the notion of barycenters, a suitable version of Jensen's inequality and a synthetic notion of lower curvature bound in metric spaces known as the measure contraction property. We also adapt the online-to-batch conversion principle to apply our results to a statistical learning framework. |
27.04.2021 |
Иван Оселедец (Сколтех) |
Геометрия в моделях машинного обучения |
Об использовании геометрического подхода к моделям машинного обучения, а также приведу обзор наших результатов последних лет, включая: а) применение гиперболических пространств в машинном обучении и создание рекомендательных моделей; б) построение интерпретируемых направлений в латентном пространстве генеративных сетей. |
12.05.2021-15.05.2021 |
Conference |
New Frontiers in High-dimensional Probability and Applications to Machine Learning |
|
18.05.2021 |
Egor Klochkov (Cambridge University) |
Fast rates for strongly convex optimization via stability |
The sharpest known high probability generalization bounds for uniformly stable algorithms (Feldman, Vondrák, 2018, 2019), (Bousquet, Klochkov, Zhivotovskiy, 2020) contain a generally inevitable sampling error term of order 1/sqrt(n). When applied to excess risk bounds, this leads to suboptimal results in several standard stochastic convex optimization problems. We show that if the so-called Bernstein condition is satisfied, the term O(1/sqrt(n)) can be avoided, and high probability excess risk bounds of order up to O(1/n) are possible via uniform stability. Using this result, we show a high probability excess risk bound with the rate O(log n / n) for strongly convex and Lipschitz losses valid for any empirical risk minimization method. This resolves a question of Shalev-Shwartz, Shamir, Srebro, and Sridharan (2009). We discuss how O(logn/n) high probability excess risk bounds are possible for projected gradient descent in the case of strongly convex and Lipschitz losses without the usual smoothness assumption. This is a joint work with Nikita Zhivotovskiy and Olivier Bousquet. |
01.06.2021 |
Sergey Samsonov (HSE University) |
UVIP: Model-Free Approach to Evaluate Reinforcement Learning Algorithms |
Policy evaluation is an important instrument for the comparison of different algorithms in Reinforcement Learning (RL). Yet even a precise knowledge of the value function V^{\pi} corresponding to a policy \pi does not provide reliable information on how far is the policy \pi from the optimal one. We present a novel model-free upper value iteration procedure (UVIP) that allows us to estimate the suboptimality gap V*(x) - V^{\pi}(x) from above and to construct confidence intervals for V*. Our approach relies on upper bounds to the solution of the Bellman optimality equation via martingale approach. We provide theoretical guarantees for UVIP under general assumptions and illustrate its performance on a number of benchmark RL problems. The talk is based on our recent work with Denis Belomestny, Ilya Levin, Eric Moulines, Alexey Naumov and Veronika Zorina. |
22.06.2021 |
Dmitrii Ostrovskii (University of Southern California) |
Near-Optimal Model Discrimination with Non-Disclosure |
In the standard setup of parametric M-estimation with convex loss, we consider two population risk minimizers associated with two possible distributions of a random observation, and then ask the following question: Given the value of one of the two minimizers and access to i.i.d. sampling from both distributions, what sample sizes allow to identify the “right” distribution, i.e., the one corresponding to the given value?Making the first steps towards answering this question in full generality, we first consider the case of a well-specified linear model with squared loss. Here we provide nearly matching upper and lower bounds on the sample complexity, showing it to be min{1/Δ^2,\sqrt{r}/Δ} up to a constant factor, where Δ is a measure of separation between the two distributions and r is the maximum (resp., minimum) of the ranks of the design covariance matrices in the case of the upper (resp., lower) bound. This bound is dimension-independent and rank-independent for large enough separation. We then extend this result in two directions: (i) for the general parametric setup in asymptotic regime; (ii) for generalized linear models in the small-sample regime n≤r and under weak moment assumptions. In both cases, we derive sample complexity bounds of a similar form, even under misspecification. Our testing procedures only access the known'' value through a certain functional of empirical risk. In addition, the number of observations that allows to reach statistical confidence for testing does not allow to "resolve" the two models -- that is, recover both minimizers up to O(Δ) prediction accuracy. These two properties open the prospect of applying our testing framework in practical tasks, where one would like to identify a prediction model, which can be proprietary, while guaranteeing that the model cannot be actually inferred by the identifying agent. The talk will be following the recent paper arxiv:2012.02901. |
12.10.2021 |
Evgeny Spodarev (Ulm University, Germany) |
Spectral classification of fullerenes |
Joint work with A. Bille (Skoltech, Moscow/Ulm University), V. Buchstaber (Steklov Mathematical Institute, Moscow) and S. Coste (INRIA, Paris). Fullerenes $C_n$ are 3-simple convex compact polytopes with $n$ vertices, $n\ge 20$ even, $n\neq 22$, 12 pentagons and $n/2-10$ hexagons as facets. They are mathematical models of famous carbon molecules which find wide applications in modern nanotechnologies. For the synthesis of $C_60$, R. Curl, H. Kroto and R. Smalley got a Nobel prize in chemistry (1996). Since the number of distinct fullerene isomers (which are combinatorially non—equivalent versions of $C_n$) grows as $O(n^9)$ for large $n$, their enumeration and classification is a non-trivial mathematical problem. We propose a solution to this problem using the Newton polynomials of the spectra of their dual graphs of hexagons. It appears that the degree of such polynomials has a logarithmic growth order with respect to $n$. If the time allows, the spectral linear ordering of characters of $C_n$ for different $n$ as well as the distribution of a random eigenvalue of Laplace-type matrices for the graphene, regular triangulation of the plane and infinite nanotubes with chiral vector $(p,q)$ will be touched upon. |
23.11.2021-26.11.2021 |
Dmitry Vetrov (HSE University); Alexey Naumov (HSE University); Denis Derkach (HSE University); Mikhail Lazarev (HSE University); Nikita Kazeev (HSE University); Sergey Shirobokov (LCL, Twitter); Maxim Panov (Scoltech); Mikhail Burtsev (MIPT) |
Third HSE-Yandex autumn school on generative models |
Date |
Speaker |
Title |
Abstract |
3.03.2020 |
Nazar Buzun (Skoltech) |
Гауссовская аппроксимация максимума суммы случайных векторов. |
В этой работе рассматривается Гауссовская аппроксимация для максимума суммы независимых случайных векторов в высокой размерности. Наш подход основан на аппроксимации по вероятности. В этом случае случайные величины зависят друг от друга. Из аппроксимации по вероятности непосредственно следует аппроксимация по распределению. Мы получили теоретическую оценку скорости сходимости к максимуму нормального вектора, которая является существенным улучшением в сравнении с полученными ранее. |
6.03.2020 |
Alexandra Suvorikova (WIAS Berlin) |
Shape-based domain adaptation via optimal transportation |
Domain adaptation problem aims at learning a well performing model, trained on a source data S (images, vectors, e.t.c), applied then to different (but related) target sample T. Aside from being attractive due to obvious practical utility, the setting is challenging from theoretical point of view. In this work we introduce a novel approach to supervised domain adaptation consisting in a class-dependent fitting based on ideas from optimal transportation (OT) theory which considers S and T as two mixtures of distributions. A parametrized OT distance is used as a fidelity measure between S and T, providing a toolbox for modelling of possibly independent perturbations of mixture components. The method is than used for describing the adaptation of immune system in humans after moving to another climatic zone. |
11.03.2020 |
Sergey Bobkov (University of Minnesota, HSE University) |
Гауссовская концентрация, сферическая концентрация, и их приложения к рандомизированным моделям в центральной предельной теореме |
Планируется обсудить так называемый феномен концентрации меры и соответствующие им соболевские неравенства применительно к многомерным гауссовским мерам и равномерному распределению на единичной сфере в евклидовом пространстве. Помимо классических результатов, описывающих повышение степени концентрации с ростом размерности, будут также представлены и новые оценки для больших уклонений, с улучшением стандартных неравенств для гладких функционалов на сфере. В качестве иллюстрации возможных приложений, будут рассмотрены задачи о скорости сходимости к нормальному закону для взвешенных сумм зависимых величин |
18.03.2020 |
Sergey Bobkov (University of Minnesota, HSE University) |
Гауссовская концентрация, сферическая концентрация, и их приложения к рандомизированным моделям в центральной предельной теореме |
Планируется обсудить так называемый феномен концентрации меры и соответствующие им соболевские неравенства применительно к многомерным гауссовским мерам и равномерному распределению на единичной сфере в евклидовом пространстве. Помимо классических результатов, описывающих повышение степени концентрации с ростом размерности, будут также представлены и новые оценки для больших уклонений, с улучшением стандартных неравенств для гладких функционалов на сфере. В качестве иллюстрации возможных приложений, будут рассмотрены задачи о скорости сходимости к нормальному закону для взвешенных сумм зависимых величин |
24.03.2020 (Zoom) |
Maxim Kaledin (HSE University) |
Risk-Sensitive Reinforcement Learning a review of PhD thesis by Aviv Tamar |
The main issue which keeps the applications of reinforcement learning limited is the fact that robustness, stability and durability of the learned decision policies are hard to prove. This happens due to usage of expected utility criterion which is optimized with respect to policy in all famous algorithms of RL. The criterium arised from economics and game theory where expected utility theory was historically the first well-established concept of rationality. Being very popular and well-studied in both mathematics and applications, the concept itself is not the only possible way to define rationality. In particular, there is more general class of functionals called coherent risk measures introduced in Delbaen,Artzner1998. Coherent risk measures are widely known in finance for they address exactly the robustness problem and try to quantify risk differently; despite that, in RL coherent risk measures are not so known and there are not so many papers on its usage. My talk will be devoted to the review of PhD thesis by Aviv Tamar on exploiting risk measures in RL which represents in itself a very comprehensive study on how to construct learning algorithms on popular risk measures like CVAR and semi-deviation. |
3.04.2020 (Zoom) |
Maxim Panov and Nikita Kotelevskiy (Skoltech) |
MetFlow: A New Efficient Method for Bridging the Gap between Markov Chain Monte Carlo and Variational Inference |
In this contribution, we propose a new computa- tionally efficient method to combine Variational Inference (VI) with Markov Chain Monte Carlo (MCMC). This approach can be used with generic MCMC kernels, but is especially well suited to MetFlow, a novel family of MCMC algorithms we introduce, in which proposals are obtained us- ing Normalizing Flows. The marginal distribution produced by such MCMC algorithms is a mixture of flow-based distributions, thus drastically in- creasing the expressivity of the variational family. Unlike previous methods following this direction, our approach is amenable to the reparametrization trick and does not rely on computationally expen- sive reverse kernels. Extensive numerical experi- ments show clear computational and performance improvements over state-of-the art methods. |
21.04.2020 at 16 00 (Zoom) |
Nikita Zhivotovskiy (Google Research, HSE University) |
Fast classification rates in online and statistical learning with abstention. |
This talk is devoted to a simple extension of statistical and online classification setups where the learner is allowed to abstain from a prediction by paying a cost (say, 0.49) only marginally smaller than the average cost 0.5 of a random bit prediction. We discuss how this model implies fast rates for the excess risk and the expected regret without any additional assumptions on the data generating mechanism and the loss function. Finally, we show the relations with recent results in aggregation theory and with the analysis of strongly convex losses. Based on recent works with Olivier Bousquet and Gergely Neu arXiv:1910.12756 and arxiv:2001.10623. |
23.04.2020 at 17 00 (Zoom) |
Anastasia Ivanova (MIPT, HSE University) |
Date | Speaker | Title | Abstract |
17.09.2019 |
Michel Ledoux (Université de Toulouse) |
On optimal matching of random samples |
Optimal matching problems are random variational problems widely investigated in the mathematics and physics literature, and in probability theory and statistics in the study of rates of convergence of empirical measures. Two-dimensional matching of uniform samples gave in particular rise to deep results investigated from various view points (combinatorial, generic chaining, Fourier analysis). After a review of known results, we investigate more precisely the case of Gaussian samples, first in dimension one, on the basis of explicit representations of Kantorovich metrics and a sharp analysis of more general log-concave distributions in terms of their isoperimetric profile (joint work with S. Bobkov), and then in dimension two and higher following the PDE and transportation approach recently put forward by L. Ambrosio, F. Stra and D. Trevisan. |
23-27.09.2019 |
Workshop |
||
01.10.2019 |
Alexander Goldenshluger (University of Haifa, HSE University) |
Density deconvolution under general assumptions |
In this paper we study the problem of density deconvolution under general assumptions on the measurement error distribution. Typically deconvolution estimators are constructed using Fourier transform techniques, and it is assumed that the characteristic function of the measurement errors does not have zeros on the real line. This assumption is rather strong and is not fulfilled in many cases of interest. In this paper we develop a methodology for constructing optimal density deconvolution estimators in the general setting that covers vanishing and non--vanishing characteristic functions of the measurement errors. We derive upper bounds on the risk of the proposed estimators and provide sufficient conditions under which zeros of the corresponding characteristic function have no effect on estimation accuracy. Moreover, we show that the derived conditions are also necessary in some specific problem instances. |
15.10.2019 |
Sergey Samsonov (HSE University) |
Variance reduction for dependent sequences with applications to Stochastic Gradient MCMC |
This talk will be devoted to a novel variance reduction approach for additive functionals of general dependent sequences. Our approach combines the use of control functions with the minimisation of an empirical asymptotic variance estimate. We analyse finite sample properties of the proposed method and derive convergence rates of the excess asymptotic variance to zero. We apply our methodology to Stochastic Gradient MCMC (SGMCMC) and discuss possible extensions for non-Markovian dependent sequences (SVRG-LD, SAGA-LD). Numerical evaluation of the algorithm will also be presented. The talk is based on a joint work with D. Belomestny, L. Iosipoy, E. Moulines and A. Naumov. |
22.10.2019 |
Quentin Paris (HSE University) |
Some open questions at the "intersection" between optimal transport and statistical learning. |
This talk will breafly describe the field of optimal transport and show that statistical learning can be represented as an optimal transport problem. We will investigate a few questions related to this representation and describe some potential research perspectives. |
26 - 29.11.2019 |
Autumn School |
Autumn School on Generative models |
|
3.12.2019 |
Nikita Puchkin (HSE University) |
Sample complexity of learning a manifold with an unknown dimension |
I will discuss the problem of learning a manifold from noisy observations in the case when the dimension of the true manifold is unknown. Many manifold learning procedures require the intrinsic dimension of the data as an input parameter. However, the existing literature provides theoretical guarantees on the dimension estimation only in the case, when the data lies exactly on the manifold. We study an algorithm, which adaptively chooses the manifold’s dimension and has an optimal sample complexity in terms of the Hausdorff loss. Besides, the procedure produces a manifold, such that its curvature does not exceed the curvature of the true manifold too much. |
Date | Speaker | Title | Abstract |
29.01.2019 at CS HSE |
Ekaterina Krymova (Duisburg Essen University) |
On estimation of the noise variance in high dimensional linear regression model |
We consider a problem of unknown noise variance estimation in high-dimensional linear regression model. To estimate a nuisance vector of regression coefficients we use a family of spectral regularisers of the maximum likelihood estimator. Noise variance estimation is based on an adaptive normalisation of the prediction error. We derive an upper bound for concentration of the proposed method around an ideal estimator (in case of zero nuisance). |
05.02.2019 |
Vladimir Spokoiny (WIAS, HSE University) |
Bayesian model selection |
We consider different approaches to select a prior from an ordered family of Gaussian process priors (GPP) for a linear regression model . The empirical and full (hierarchic) Bayesian methods are studied with their benefits and drawbacks. A new method for the choice of a hyper prior is suggested. The method is based on the so called “propagation” condition and ensures a one-sided contraction and adaptivity of the procedure. |
12.02.2019 |
Alexander Goldenshluger (University of Haifa, HSE University) |
Optimal stopping of a random sequence with unknown distribution |
The subject of this talk is the problem of optimal stopping of a sequence of i.i.d. random variables. When the distribution governing the random sequence is known, the optimal solution is given by backward induction. In this talk we will focus on the case when the distribution of observations is unknown. We propose a stopping rule that is based on relative ranks and study its performance as measured by the maximal relative regret over suitable nonparametric classes of distributions. It is shown that the proposed rule is first order asymptotically optimal and nearly rate–optimal in terms of the rate at which the relative regret converges to zero. We also develop a general method for numerical solution of sequential stopping problems with no distributional information, and use it in order to implement the proposed stopping rule. (Joint work with A. Zeevi) |
19.02.2019 |
Alexey Ustimenko (HSE University) |
Direct Optimization of Ranking Quality Metrics via Stochastic Approximation |
In this paper, we present a novel approach for consistent direct optimization of any ranking quality metric that significantly outperforms the state-of-the-art methods on popular learning-to-rank datasets. The framework is based on a smoothing procedure and a method of local stochastic approximation of the ranking function inspired by gradient-free methods for blackbox optimization. We also present novel unbiased and effective estimates for gradients of the smoothed ranking quality metrics that theoretically outperform likelihood ratio method while their time complexity remains the same due to combinatorial structure of the problem. |
26.02.2019 |
Denis Belomestny (Duisburg Essen University, HSE University) |
MacKeen-Vlasov stochastic differential equations: simulation and estimation |
In this talk we propose a novel projection-based particle method for solving McKean--Vlasov stochastic differential equations. This approach is based on a projection-type estimation of the marginal density of the solution in each time step. The projection-based particle method leads in many situations to a significant reduction of numerical complexity compared to the widely used kernel density estimation algorithms. We derive strong convergence rates and rates of density estimation. Furthermore we give explicit solutions for a class of McKean--Vlasov equations with affine drift. The performance of the proposed algorithm is illustrated by several numerical examples. |
5.03.2019 |
Ivan Panin (Skoltech, IITP RAS) |
Least squares in noiseless setting and Sobol indices convergence for random design |
We briefly review recent progress in noiseless nonparametric regression and consider its application to global sensitivity analysis and particularly for estimation of Sobol indicies. Sobol (sensitivity) indices allow to quantify respective effects of input random variables (or combinations thereof) onto variance of a physical or mathematical model response. We focus on the problem of Sobol indices estimation via metamodel approach in the case of random design of experiments; on both nonasymptotic and asymptotic behavior of Sobol indices estimates and the model selection for increasing computational budget. We also consider the question of an optimal rate of Sobol indices convergence. |
12.03.2019 |
Dmitry Yarotsky (Skoltech, IITP RAS) |
Gradient descent in neural networks: mean field theory and optimal transportation |
I will describe recent results in which gradient descent in shallow neural networks is approximated by an integro-differential equation appearing in the limit of infinite network width. In particular, I will discuss the connection to the optimal transport theory and will mention several modes in which the dynamics admits an approximate analytic solution. |
19.03.2019 |
Egor Kosov (MSU, HSE University) |
Lower bounds for measures of deviation of polynomials from their expectations |
In the talk we will discuss the recent results on lower estimates of small ball probabilities for random variables that are polynomials in Gaussian and general logarithmically concave random variables. |
26.03.2019 |
Nurlan Shagadatov (HSE University) |
Martingale representations for Markov Chains with application to MCMC. |
In this talk I will describe recent results of joint work with D. Belomestny and E. Moulines. I consider efficient variance reduction approach for Markov Сhain Monte Carlo algorithms relying on a new discrete time martingale representations. In particular, I will consider theoretical guarantees and generic method for Unadjusted Langevin Algorithm. |
09.04.2019 |
Evgenii Chzhen ( Laboratoire d'Analyse et de Mathématiques Appliquées) |
Semi-supervised confidence set classification |
An image annotation can be seen as a multi-class classification problem with a large number of classes. In this context, confusion between classes can occur, and single label classification may be misleading. We will talk about the multi-class classification with confidence sets of a controlled expected cardinal in non-parametric settings. We will propose a semi-supervised estimator which is either minimax optimal or optimal up to a log factor depending on the way we measure the risk. We will describe the regimes when there is no supervised estimator that can improve the proposed semi-supervised estimator and when the semi-supervised estimation does not help from the minimax point of view. (joint work with C. Denis and M. Hebiri) |
16.04.2019, CS HSE, 205 at 17:00 |
Eric Moulines (Ecole Polytechnique, HSE University) |
Mini course Introduction to reinforcement learning 1 |
The minicourse is an elementary introduction of reinforcement learning. We will formalize the problem of reinforcement learning using ideas from dynamical systems theory, specifically, as the optimal control of incompletely-known Markov decision processes. A learning agent must be able to sense the state of its environment to some extent and must be able to take actions that affect the state. The agent also must have a goal or goals relating to the state of the environment. Markov decision processes are intended to include just these three aspects—sensation, action, and goal—in their simplest possible forms without trivializing any of them. Any method that is well suited to solving such problems we consider to be a reinforcement learning method. The course is an elementary introduction covering: — An introduction to reinforcement learning — Markov decision processes — Dynamic programming — Monte Carlo methods — Temporal difference learning We will follow closely the book “Reinforcement Learning: an introduction”, Barto Sutton, MIT Press and the use the pedagogical material proposed by David Silver. |
23.04.2019, CS HSE, 503 at 17:00 |
Eric Moulines (Ecole Polytechnique) |
Mini course Introduction to reinforcement learning 1 |
The minicourse is an elementary introduction of reinforcement learning. We will formalize the problem of reinforcement learning using ideas from dynamical systems theory, specifically, as the optimal control of incompletely-known Markov decision processes. A learning agent must be able to sense the state of its environment to some extent and must be able to take actions that affect the state. The agent also must have a goal or goals relating to the state of the environment. Markov decision processes are intended to include just these three aspects—sensation, action, and goal—in their simplest possible forms without trivializing any of them. Any method that is well suited to solving such problems we consider to be a reinforcement learning method. The course is an elementary introduction covering: — An introduction to reinforcement learning — Markov decision processes — Dynamic programming — Monte Carlo methods — Temporal difference learning We will follow closely the book “Reinforcement Learning: an introduction”, Barto Sutton, MIT Press and the use the pedagogical material proposed by David Silver. |
3.06.2019-7.06.2019 |
Nikolai Krylov (University of Minnesota) |
Mini course (3 lectures) |
Topics:
Schedule: 7.06, 16:00 - 18:00 14.06, 16:00 - 18:00 28.06, 16:00 - 18:00 Venue: Faculty of Mathematics HSE, Usacheva str, 5, room 108 |
Date |
Speaker |
Title |
Abstract |
26.12.2018 |
Qeuntin Paris (HSE University) |
Learning functional minimizers on metric spaces. |
In this talk, we will discuss recent developments motivated by statistical problems that arise in optimal transport. In particular we will discuss new results concerning the estimation of so called barycenters (or Frechet means) in the Wasserstein space. Before presenting our results, the talk will review some basic material from geometry and optimal transport. The end of the talk will discuss some open problems. |
26.12.2018 |
Sergey Samsonov (HSE University, Skoltech) |
Tansportation and concentration of measure inequalities for Markov Chains |
In this talk I am going to observe some concentration of measure inequalities, starting from rather classical ones by Bakri, Ledoux. Then I will cover some interesting concentration results for Markov Chains, related with transportation inequalities following papers by Djellot, Guillin, Wu 2004 and Joulin, Olivier 2010, and concentration results for quadratic forms of vectors with dependent entries following Adamczak 2015 with particular applications to Unadjusted Langevin Dynamics. |
13.12.2018 |
Sergey Samsonov (HSE University, Skoltech) |
Concentration inequalities and their applications |
In this talk I am going to cover some functional inequalities for dependent sequences of random variables, especially focusing on the results for Markov chains based on the work by Djellout, Guilin and Wu. As one of the applications we will consider Unadjusted Langevin Algorithm, following paper by Durmus, Moulines, and state some concentration results for it. |
16.12.2018 |
Maxim Kaledin (HSE University, Skoltech) |
Dynamic Programming in Stochastic Optimal Control Problems. |
Stochastic control problems are well-known in financial mathematics (option pricing, portfolio management) and in many related areas. These problems even in case of discrete state space and known transition density are subject to curse of dimensionality when using deterministic algorithms. Kean&Wolpin(1994) and Rust (1997) were the first who proved that this can be avoided in case of discrete Markov Decision Processes with stochastic algorithms. The modern challenge is to figure out what happens with complexity when the transition density can only be approximated (RL,ML, Finance applications). During my talk I will cover known and new complexity estimates for stochastic control problems as well as several algorithms to solve MDP. |
29.11.2018 |
Vladimir Ulyanov (HSE University, MSU) |
Non-linear expectations: notion, facts, applications. |
We give a short review on recent developments on problems of probability model under uncertainty by using the notion of nonlinear expectations and, in particular, sublinear expectations. We discuss as well a new law of large numbers (LLN) and central limit theorem (CLT) and provide non-asymptotic results. Classical LLN and CLT have been widely used in probability theory, statistics, data analysis as well as in many practical situations such as financial pricing and risk management. They provide a strong and convincing way to explain why in practice normal distributions are so widely utilized. But often a serious problem is that, in general, the “i.i.d.” condition is difficult to be satisfied. In practice, for the most real-time processes and data for which the classical trials and samplings become impossible, the uncertainty of probabilities and distributions can not be neglected. In fact the abuse of normal distributions in finance and many other industrial or commercial domains has been criticized. The new CLT does not need this strong “i.i.d.” assumption. Instead of fixing a probability measure P, it is introduced an uncertain subset of probability measures {P_a : a in A } and considered the corresponding sublinear expectation as sup{a in A} E_{a}X.
|
22.11.2018 |
Nikita Zhivotovskiy (HSE University) |
Covariance estimation: missing observations and heavy tails. |
In this talk, I will discuss some recent results on covariance estimation together with some work in progress. I will start with the subgaussian case when some of the observations are missing: we observe N i.i.d. vectors and want to estimate the covariance matrix. The problem is that some of the components of each vector are missing, that is these components are set to be equal to zero with some probability. We show how to improve the state of the art results in this scenario. Next, we discuss how to apply dimension-free Bernstein type inequalities for vectors with heavy tails under mild moment assumptions.
|
14.11.2018 |
Eric Moulines (Ecole Polytechnique, HSE University) |
Low-rank Interaction with Sparse Additive Effects Model for Large Data Frames (NIPS paper) |
Many applications of machine learning involve the analysis of large data frames — matrices collecting heterogeneous measurements (binary, numerical, counts, etc.) across samples — with missing values. Low-rank models, as studied by Udell et al. (2016), are popular in this framework for tasks such as visualization, clustering and missing value imputation. Yet, available methods with statistical guarantees and efficient optimization do not allow explicit modeling of main additive effects such as row and column, or covariate effects. In this paper, we introduce a low-rank interaction and sparse additive effects (LORIS) model which combines matrix regression on a dictionary and low-rank design, to estimate main effects and interactions simultaneously. We provide statistical guarantees in the form of upper bounds on the estimation error of both components. Then, we introduce a mixed coordinate gradient descent (MCGD) method which provably converges sub-linearly to an optimal solution and is computationally efficient for large scale data sets. We show on simulated and survey data that the method has a clear advantage over current practices. |
14.11.2018 |
Błażej Miasojedow (University of Warsaw) |
Statistical inference for Continous Time Bayesian Networks. |
Continuous Time Bayesian Networks (CTBN) are a class of multivariate Markov Jump Processes (MJP), where dependence between coordinates is described by a directed graph with possible cycles. CTBNs are a flexible tool for modelling phenomena from different areas such as chemistry, biology, social sciences etc. In the talk, I will discuss the problems of parameter learning and structure learning for CTBNs with a special attention to computational methods. |
8.11.2018 |
Alexey Naumov (HSE University) |
Gaussian approximations for maxima of large number of quadratic forms of high-dimensional random vectors |
Let X_1, …, X_n be i.i.d. random vectors taking values in R^d, d \geq 1, and Q_1, …, Q_p, p \geq 1, be symmetric positive definite matrices. We consider the distribution function of vector (Q_j S_n, S_n), j = 1, …, p, where S_n = n^{-1/2}(X_1 + … + X_n), and prove the rate of Gaussian approximation with explicit dependence on n, p and d. We also compare this result with results of Bentkus (2003) and Chernozhukov, Chetverikov, Kato (2016). Applications to change point detection and model selection will be discussed as well. The talk is based on the joint project with F. Goetze, V. Spokoiny and A. Tikhomirov.
|
1.11.2018 |
Andzhey Koziuk (WIAS, Berlin) |
Exact smoothning of a probability measure |
In the talk an exact smoothing of indicator function is introduced and developed to allow differential calculus access the internal machinery behind a problem of measures comparison. Specifically, the problem of accuracy of non-classical analogue of Berry-Esseen inequality for a radnom elemets in Hilbert space is addressed with the help of the tool and aided with the classical Lindenberg chained sum construction. |
25.10.2018 |
Nikita Zhivotoskiy (Technion, HSE University) |
Robust estimation of the mean of a random vector |
n this talk, we discuss the estimation of the mean random of a vector from an i.i.d. sample of its observations. The latest results show that using a special procedure, the mean of a random vector, for which only the existence of a covariance matrix is known, can be estimated as precisely as if the vector had a multidimensional normal distribution with the same covariance structure. The estimation of the covariance matrix with minimal constraints on the underlying distribution will be briefly discussed. The talk is based on the works of arxiv: 1702.00482 and arxiv: 1809.10462 |
18.10.2018 |
Alexey Ustimenko (HSE University, Skoltech) |
Convergence of Laplacian Spectra from Point Clouds |
The Laplacian-Beltrami operator (LBO) on manifolds and its spectral structure has been widely used in many applications such as manifold learning, spectral clustering, dimensionality reduction and so on. Recently, Point Integral Method (PIM) was proposed to estimate the LBO by the discrete Laplace operator constructed from a set of sample points. We would discuss a paper “Convergence of Laplacian Spectra from Point Clouds” by Zuoqiang Shi, Jian Sun and their’s proof of the convergence of the eigenvalues and eigenvectors obtained by PIM to the eigenvalues and eigenfunctions of the LBO with the Neumann boundary. |
11.10.2018 |
Nikita Puchkin (HSE University) |
Manifold Learning |
I will discuss a problem of a manifold reconstruction from a noisy point cloud. There are plenty methods of manifold learning (ISOMAP, LLE, etc.) but only a few papers provide a theoretical justification of the proposed algorithms (Genovese et. al. (2012), Maggioni et. al. (2016), Aamari and Levrard (2017)). An approach based on a local polynomial estimation (Aamari, Levrard (2017)) reconstructs distinct parts of the manifold and the reconstructed manifold is often disconnected. Another approach (Genovese et. al. (2012)) tries to estimate a density with a support on the manifold. This approach requires a specific distribution of the noise, which is unlikely to hold in practice. A recently proposed method (Osher et. al. (2017)), based on an approximation of the Laplace-Beltrami operator, is aimed at a global reconstruction of the manifold and does not have the mentioned drawbacks. Unfortunately, the statistical properties of this method are not studied so far. In my talk, I will discuss some details concerning a theoretical justification of the procedure. |
4.10.2018 |
Valentina Shumovskaya (HSE University, Skoltech) |
Towards Hypothesis Testing for Random Graphs with Community Structure |
The analysis of random graphs and network analysis recently become an active area of research. We will dicuss the problem of testing between two populations of inhomogeneous random graphs defined on the same set of vertices, where the null hypothesis means that the underlying edge probability matrices coincide. We propose a new approach of random graphs testing based on community detection procedure: we introduce a structural assumption, which is that our graphs have community structures with k communities. |
27.09.2018 |
Alexander Goldenshluger (University of Haifa, HSE University) |
Nonparamertic estimation in non-regular deconvolution problems. |
I will discuss problems of nonparametric estimation in non-regular deconvolution models. In such settings standard estimation methods based on the Fourier transform are not applicable. I will review the existing literature on the topic, discuss general ideas for constructing optimal estimators, and present some preliminary results. Based on joint ongoing work with D. Belomestny. |
20.09.2018 |
Alexander Timofeev (MIPT) |
Density-sensitive semisupervised inference |
In the talk, we will review the article written by Martin Azizyan, Aarti Singh and Larry Wasserman. We will consider semisupervised inference for the regression problem, based on a density-sensitive metric and show that the proposed semisupervised method outperforms any supervised one for certain sets. At the end of the talk, We will demonstrate how to calculate the density-sensitive metric. |
13.09.2018 |
Katya Krymova (University of Duisburg-Essen) |
On a sparse modification of projection approximation subspace tracking method |
In this talk we revisit the well-known constrained projection approximation subspace tracking algorithm (CPAST) and derive non-asymptotic bounds on its error. Furthermore we introduce a novel sparse modification of CPAST which is able to exploit sparsity of the underlying covariance structure. We present a non-asymptotic analysis of the proposed algorithm and study its empirical performance on simulated and real data. |
10.09.2018 |
Steve Oudot (Ecole Polytechnique) |
Statistics and Learning with topological descriptors |
I this survey talk I will review some of the recent advances in Topological Data Analysis to define stable descriptors for data, to use these descriptors for learning tasks, and to do statistics on them. Among other topics, I will address finding suitable metrics, defining and computing means and barycenters, deriving confidence intervals, laws of large numbers, central limit theorems, and kernel mean embeddings. |