# Scientific seminar

Structural Learning Seminar is a seminar for researchers, Master’s and Ph.D. students of HSE, Skoltech, IITP. This seminar is devoted to the problems of Mathematics and Computer Science and covers areas such as Mathematical Statistics, Machine Learning, Random Matrix Theory, Optimization, Information Theory, etc.

Seminar organizers: Vladimir Spokoiny, Leonid Iosipoi, Alexey Naumov, Maxim Panov.

The seminar takes place on Thursday at IITP RAS, room 615 at 5 PM .

Seminar schedule. Follow the seminar announcements in Telegram.

**Spring 2019**

Date | Speaker | Title | Abstract |

29.01.2019 at CS HSE | Ekaterina Krymova (Duisburg Essen University) | On estimation of the noise variance in high dimensional linear regression model | We consider a problem of unknown noise variance estimation in high-dimensional linear regression model. To estimate a nuisance vector of regression coefficients we use a family of spectral regularisers of the maximum likelihood estimator. Noise variance estimation is based on an adaptive normalisation of the prediction error. We derive an upper bound for concentration of the proposed method around an ideal estimator (in case of zero nuisance). |

05.02.2019 | Vladimir Spokoiny (WIAS, HSE) | Bayesian model selection | We consider different approaches to select a prior from an ordered family of Gaussian process priors (GPP) for a linear regression model . The empirical and full (hierarchic) Bayesian methods are studied with their benefits and drawbacks. A new method for the choice of a hyper prior is suggested. The method is based on the so called “propagation” condition and ensures a one-sided contraction and adaptivity of the procedure. |

12.02.2019 | Alexander Goldenshluger (University of Haifa, HSE) | Optimal stopping of a random sequence with unknown distribution | The subject of this talk is the problem of optimal stopping of a sequence of i.i.d. random variables. When the distribution governing the random sequence is known, the optimal solution is given by backward induction. In this talk we will focus on the case when the distribution of observations is unknown. We propose a stopping rule that is based on relative ranks and study its performance as measured by the maximal relative regret over suitable nonparametric classes of distributions. It is shown that the proposed rule is first order asymptotically optimal and nearly rate–optimal in terms of the rate at which the relative regret converges to zero. We also develop a general method for numerical solution of sequential stopping problems with no distributional information, and use it in order to implement the proposed stopping rule.
(Joint work with A. Zeevi) |

19.02.2019 | Alexey Ustimenko (HSE) | Direct Optimization of Ranking Quality Metrics via Stochastic Approximation | In this paper, we present a novel approach for consistent direct optimization of any ranking quality metric that significantly outperforms the state-of-the-art methods on popular learning-to-rank datasets. The framework is based on a smoothing procedure and a method of local stochastic approximation of the ranking function inspired by gradient-free methods for blackbox optimization. We also present novel unbiased and effective estimates for gradients of the smoothed ranking quality metrics that theoretically outperform likelihood ratio method while their time complexity remains the same due to combinatorial structure of the problem. |

26.02.2019 | Denis Belomestny (Duisburg Essen University, HSE) | MacKeen-Vlasov stochastic differential equations: simulation and estimation | In this talk we propose a novel projection-based particle method for solving McKean--Vlasov stochastic differential equations. This approach is based on a projection-type estimation of the marginal density of the solution in each time step. The projection-based particle method leads in many situations to a significant reduction of numerical complexity compared to the widely used kernel density estimation algorithms. We derive strong convergence rates and rates of density estimation. Furthermore we give explicit solutions for a class of McKean--Vlasov equations with affine drift. The performance of the proposed algorithm is illustrated by several numerical examples. |

5.03.2019 | Ivan Panin (Skoltech, IITP RAS) | Least squares in noiseless setting and Sobol indices convergence for random design | We briefly review recent progress in noiseless nonparametric regression and consider its application to global sensitivity analysis and particularly for estimation of Sobol indicies. Sobol (sensitivity) indices allow to quantify respective effects of input random variables (or combinations thereof) onto variance of a physical or mathematical model response. We focus on the problem of Sobol indices estimation via metamodel approach in the case of random design of experiments; on both nonasymptotic and asymptotic behavior of Sobol indices estimates and the model selection for increasing computational budget. We also consider the question of an optimal rate of Sobol indices convergence. |

12.03.2019 | Dmitry Yarotsky (Skoltech, IITP RAS) | Gradient descent in neural networks: mean field theory and optimal transportation | I will describe recent results in which gradient descent in shallow neural networks is approximated by an integro-differential equation appearing in the limit of infinite network width. In particular, I will discuss the connection to the optimal transport theory and will mention several modes in which the dynamics admits an approximate analytic solution. |

19.03.2019 | Egor Kosov (MSU, HSE) | Lower bounds for measures of deviation of polynomials from their expectations | In the talk we will discuss the recent results on lower estimates of small ball probabilities for random variables that are polynomials in Gaussian and general logarithmically concave random variables. |

26.03.2019 | Nurlan Shagadatov (HSE) | Martingale representations for Markov Chains with application to MCMC. | In this talk I will describe recent results of joint work with D. Belomestny and E. Moulines. I consider efficient variance reduction approach for Markov Сhain Monte Carlo algorithms relying on a new discrete time martingale representations. In particular, I will consider theoretical guarantees and generic method for Unadjusted Langevin Algorithm. |

09.04.2019 | Evgenii Chzhen ( Laboratoire d'Analyse et de Mathématiques Appliquées) | Semi-supervised confidence set classification | An image annotation can be seen as a multi-class classification problem with a large number of classes. In this context, confusion between classes can occur, and single label classification may be misleading. We will talk about the multi-class classification with confidence sets of a controlled expected cardinal in non-parametric settings. We will propose a semi-supervised estimator which is either minimax optimal or optimal up to a log factor depending on the way we measure the risk. We will describe the regimes when there is no supervised estimator that can improve the proposed semi-supervised estimator and when the semi-supervised estimation does not help from the minimax point of view. (joint work with C. Denis and M. Hebiri) |

16.04.2019, CS HSE, 205 at 17:00 | Eric Moulines (Ecole Polytechnique, HSE) | Mini course Introduction to reinforcement learning 1 | The minicourse is an elementary introduction of reinforcement learning. We will formalize the problem of reinforcement learning using ideas from dynamical systems theory, specifically, as the optimal control of incompletely-known Markov decision processes. A learning agent must be able to sense the state of its environment to some extent and must be able to take actions that affect the state. The agent also must have a goal or goals relating to the state of the environment. Markov decision processes are intended to include just these three aspects—sensation, action, and goal—in their simplest possible forms without trivializing any of them. Any method that is well suited to solving such problems we consider to be a reinforcement learning method. The course is an elementary introduction covering: — An introduction to reinforcement learning — Markov decision processes — Dynamic programming — Monte Carlo methods — Temporal difference learning We will follow closely the book “Reinforcement Learning: an introduction”, Barto Sutton, MIT Press and the use the pedagogical material proposed by David Silver. |

23.04.2019, CS HSE, 503 at 17:00 | Eric Moulines (Ecole Polytechnique) | Mini course Introduction to reinforcement learning 1 | The minicourse is an elementary introduction of reinforcement learning. We will formalize the problem of reinforcement learning using ideas from dynamical systems theory, specifically, as the optimal control of incompletely-known Markov decision processes. A learning agent must be able to sense the state of its environment to some extent and must be able to take actions that affect the state. The agent also must have a goal or goals relating to the state of the environment. Markov decision processes are intended to include just these three aspects—sensation, action, and goal—in their simplest possible forms without trivializing any of them. Any method that is well suited to solving such problems we consider to be a reinforcement learning method. The course is an elementary introduction covering: — An introduction to reinforcement learning — Markov decision processes — Dynamic programming — Monte Carlo methods — Temporal difference learning We will follow closely the book “Reinforcement Learning: an introduction”, Barto Sutton, MIT Press and the use the pedagogical material proposed by David Silver. |

30.04.2019 | TBA () | TBA | TBA |

14.05.2019 | TBA () | TBA | TBA |

21.05.2019 | TBA () | TBA | TBA |

28.05.2019 | TBA () | TBA | TBA |

3.06.2019-7.06.2019 | Nikolai Krylov (University of Minnesote) | Mini course (3 lectures) | Place and time: TBA |

**Fall 2018**

Date | Speaker | Title | Abstract |

26.12.2018 | Qeuntin Paris (HSE) | Learning functional minimizers on metric spaces. | In this talk, we will discuss recent developments motivated by statistical problems that arise in optimal transport. In particular we will discuss new results concerning the estimation of so called barycenters (or Frechet means) in the Wasserstein space. Before presenting our results, the talk will review some basic material from geometry and optimal transport. The end of the talk will discuss some open problems. |

26.12.2018 | Sergey Samsonov (HSE, Skoltech) | Tansportation and concentration of measure inequalities for Markov Chains | In this talk I am going to observe some concentration of measure inequalities, starting from rather classical ones by Bakri, Ledoux. Then I will cover some interesting concentration results for Markov Chains, related with transportation inequalities following papers by Djellot, Guillin, Wu 2004 and Joulin, Olivier 2010, and concentration results for quadratic forms of vectors with dependent entries following Adamczak 2015 with particular applications to Unadjusted Langevin Dynamics. |

13.12.2018 | Sergey Samsonov (HSE, Skoltech) | Concentration inequalities and their applications | In this talk I am going to cover some functional inequalities for dependent sequences of random variables, especially focusing on the results for Markov chains based on the work by Djellout, Guilin and Wu. As one of the applications we will consider Unadjusted Langevin Algorithm, following paper by Durmus, Moulines, and state some concentration results for it. |

6.12.2018 | Maxim Kaledin (HSE, Skoltech) | Dynamic Programming in Stochastic Optimal Control Problems. | Stochastic control problems are well-known in financial mathematics (option pricing, portfolio management) and in many related areas. These problems even in case of discrete state space and known transition density are subject to curse of dimensionality when using deterministic algorithms. Kean&Wolpin(1994) and Rust (1997) were the first who proved that this can be avoided in case of discrete Markov Decision Processes with stochastic algorithms. The modern challenge is to figure out what happens with complexity when the transition density can only be approximated (RL,ML, Finance applications). During my talk I will cover known and new complexity estimates for stochastic control problems as well as several algorithms to solve MDP. |

29.11.2018 | Vladimir Ulyanov (HSE, MSU) | Non-linear expectations: notion, facts, applications. | We give a short review on recent developments on problems of probability model under uncertainty by using the notion of nonlinear expectations and, in particular, sublinear expectations. We discuss as well a new law of large numbers (LLN) and central limit theorem (CLT) and provide non-asymptotic results. Classical LLN and CLT have been widely used in probability theory, statistics, data analysis as well as in many practical situations such as financial pricing and risk management. They provide a strong and convincing way to explain why in practice normal distributions are so widely utilized. But often a serious problem is that, in general, the “i.i.d.” condition is difficult to be satisfied. In practice, for the most real-time processes and data for which the classical trials and samplings become impossible, the uncertainty of probabilities and distributions can not be neglected. In fact the abuse of normal distributions in finance and many other industrial or commercial domains has been criticized. The new CLT does not need this strong “i.i.d.” assumption. Instead of fixing a probability measure P, it is introduced an uncertain subset of probability measures {P_a : a in A } and considered the corresponding sublinear expectation as sup{a in A} E_{a}X. |

22.11.2018 | Nikita Zhivotovskiy (HSE) | Covariance estimation: missing observations and heavy tails. | In this talk, I will discuss some recent results on covariance estimation together with some work in progress. I will start with the subgaussian case when some of the observations are missing: we observe N i.i.d. vectors and want to estimate the covariance matrix. The problem is that some of the components of each vector are missing, that is these components are set to be equal to zero with some probability. We show how to improve the state of the art results in this scenario. Next, we discuss how to apply dimension-free Bernstein type inequalities for vectors with heavy tails under mild moment assumptions. |

14.11.2018 | Eric Moulines (Ecole Polytechnique, HSE) | Low-rank Interaction with Sparse Additive Effects Model for Large Data Frames (NIPS paper) | Many applications of machine learning involve the analysis of large data frames — matrices collecting heterogeneous measurements (binary, numerical, counts, etc.) across samples — with missing values. Low-rank models, as studied by Udell et al. (2016), are popular in this framework for tasks such as visualization, clustering and missing value imputation. Yet, available methods with statistical guarantees and efficient optimization do not allow explicit modeling of main additive effects such as row and column, or covariate effects. In this paper, we introduce a low-rank interaction and sparse additive effects (LORIS) model which combines matrix regression on a dictionary and low-rank design, to estimate main effects and interactions simultaneously. We provide statistical guarantees in the form of upper bounds on the estimation error of both components. Then, we introduce a mixed coordinate gradient descent (MCGD) method which provably converges sub-linearly to an optimal solution and is computationally efficient for large scale data sets. We show on simulated and survey data that the method has a clear advantage over current practices. |

14.11.2018 | Błażej Miasojedow (University of Warsaw) | Statistical inference for Continous Time Bayesian Networks. | Continuous Time Bayesian Networks (CTBN) are a class of multivariate Markov Jump Processes (MJP), where dependence between coordinates is described by a directed graph with possible cycles. CTBNs are a flexible tool for modelling phenomena from different areas such as chemistry, biology, social sciences etc. In the talk, I will discuss the problems of parameter learning and structure learning for CTBNs with a special attention to computational methods. |

8.11.2018 | Alexey Naumov (HSE) | Gaussian approximations for maxima of large number of quadratic forms of high-dimensional random vectors | Let X_1, … , X_n be i.i.d. random vectors taking values in R^d, d \geq 1, and Q_1, … , Q_p, p \geq 1, be symmetric positive definite matrices. We consider the distribution function of vector (Q_j S_n, S_n), j = 1, … , p, where S_n = n^{-1/2}(X_1 + … + X_n), and prove the rate of Gaussian approximation with explicit dependence on n, p and d. We also compare this result with results of Bentkus (2003) and Chernozhukov, Chetverikov, Kato (2016). Applications to change point detection and model selection will be discussed as well. The talk is based on the joint project with F. Goetze, V. Spokoiny and A. Tikhomirov. |

1.11.2018 | Andzhey Koziuk (WIAS, Berlin) | Exact smoothning of a probability measure | In the talk an exact smoothing of indicator function is introduced and developed to allow differential calculus access the internal machinery behind a problem of measures comparison. Specifically, the problem of accuracy of non-classical analogue of Berry-Esseen inequality for a radnom elemets in Hilbert space is addressed with the help of the tool and aided with the classical Lindenberg chained sum construction. |

25.10.2018 | Nikita Zhivotoskiy (Technion, HSE) | Robust estimation of the mean of a random vector | In this talk, we discuss the estimation of the mean random of a vector from an i.i.d. sample of its observations. The latest results show that using a special procedure, the mean of a random vector, for which only the existence of a covariance matrix is known, can be estimated as precisely as if the vector had a multidimensional normal distribution with the same covariance structure. The estimation of the covariance matrix with minimal constraints on the underlying distribution will be briefly discussed. The talk is based on the works of arxiv: 1702.00482 and arxiv: 1809.10462 |

18.10.2018 | Alexey Ustimenko (HSE/Skoltech) | Convergence of Laplacian Spectra from Point Clouds | The Laplacian-Beltrami operator (LBO) on manifolds and its spectral structure has been widely used in many applications such as manifold learning, spectral clustering, dimensionality reduction and so on. Recently, Point Integral Method (PIM) was proposed to estimate the LBO by the discrete Laplace operator constructed from a set of sample points. We would discuss a paper “Convergence of Laplacian Spectra from Point Clouds” by Zuoqiang Shi, Jian Sun and their’s proof of the convergence of the eigenvalues and eigenvectors obtained by PIM to the eigenvalues and eigenfunctions of the LBO with the Neumann boundary. |

11.10.2018 | Nikita Puchkin (HSE) | Manifold Learning | I will discuss a problem of a manifold reconstruction from a noisy point cloud. There are plenty methods of manifold learning (ISOMAP, LLE, etc.) but only a few papers provide a theoretical justification of the proposed algorithms (Genovese et. al. (2012), Maggioni et. al. (2016), Aamari and Levrard (2017)). An approach based on a local polynomial estimation (Aamari, Levrard (2017)) reconstructs distinct parts of the manifold and the reconstructed manifold is often disconnected. Another approach (Genovese et. al. (2012)) tries to estimate a density with a support on the manifold. This approach requires a specific distribution of the noise, which is unlikely to hold in practice. A recently proposed method (Osher et. al. (2017)), based on an approximation of the Laplace-Beltrami operator, is aimed at a global reconstruction of the manifold and does not have the mentioned drawbacks. Unfortunately, the statistical properties of this method are not studied so far. In my talk, I will discuss some details concerning a theoretical justification of the procedure. |

4.10.2018 | Valentina Shumovskaya (HSE, Skoltech) | Towards Hypothesis Testing for Random Graphs with Community Structure | The analysis of random graphs and network analysis recently become an active area of research. We will dicuss the problem of testing between two populations of inhomogeneous random graphs defined on the same set of vertices, where the null hypothesis means that the underlying edge probability matrices coincide. We propose a new approach of random graphs testing based on community detection procedure: we introduce a structural assumption, which is that our graphs have community structures with k communities. |

27.09.2018 | Alexander Goldenshluger (University of Haifa, HSE) | Nonparamertic estimation in non-regular deconvolution problems. | I will discuss problems of nonparametric estimation in non-regular deconvolution models. In such settings standard estimation methods based on the Fourier transform are not applicable. I will review the existing literature on the topic, discuss general ideas for constructing optimal estimators, and present some preliminary results. Based on joint ongoing work with D. Belomestny. |

20.09.2018 | Alexander Timofeev (MIPT) | Density-sensitive semisupervised inference | In the talk, we will review the article written by Martin Azizyan, Aarti Singh and Larry Wasserman. We will consider semisupervised inference for the regression problem, based on a density-sensitive metric and show that the proposed semisupervised method outperforms any supervised one for certain sets. At the end of the talk, We will demonstrate how to calculate the density-sensitive metric. |

13.09.2018 | Katya Krymova (University of Duisburg-Essen) | On a sparse modification of projection approximation subspace tracking method | In this talk we revisit the well-known constrained projection approximation subspace tracking algorithm (CPAST) and derive non-asymptotic bounds on its error. Furthermore we introduce a novel sparse modification of CPAST which is able to exploit sparsity of the underlying covariance structure. We present a non-asymptotic analysis of the proposed algorithm and study its empirical performance on simulated and real data. |

10.09.2018 | Steve Oudot (Ecole Polytechnique) | Statistics and Learning with topological descriptors | I this survey talk I will review some of the recent advances in Topological Data Analysis to define stable descriptors for data, to use these descriptors for learning tasks, and to do statistics on them. Among other topics, I will address finding suitable metrics, defining and computing means and barycenters, deriving confidence intervals, laws of large numbers, central limit theorems, and kernel mean embeddings. |

Have you spotted a typo?

Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

To be used only for spelling or punctuation mistakes.