# Publications

This paper provides rates of convergence for empirical (generalised) barycenters on compact geodesic metric spaces under general conditions using empirical processes techniques. Our main assumption is termed a variance inequality and provides a strong connection between usual assumptions in the field of empirical processes and central concepts of metric geometry. We study the validity of variance inequalities in spaces of non-positive and non-negative Aleksandrov curvature. In this last scenario, we show that variance inequalities hold provided geodesics, emanating from a barycenter, can be extended by a constant factor. We also relate variance inequalities to strong geodesic convexity. While not restricted to this setting, our results are largely discussed in the context of the 2-Wasserstein space.

This work establishes fast rates of convergence for empirical barycenters over a large class of geodesic spaces with curvature bounds in the sense of Alexandrov. More specifically, we show that parametric rates of convergence are achievable under natural conditions that characterize the bi-extendibility of geodesics emanating from a barycenter. These results largely advance the state-of-the-art on the subject both in terms of rates of convergence and the variety of spaces covered. In particular, our results apply to infinite-dimensional spaces such as the 2-Wasserstein space, where bi-extendibility of geodesics translates into regularity of Kantorovich potentials.

This paper addresses the problem of prediction with expert advice for outcomes in a geodesic space with non-positive curvature in the sense of Alexandrov. Via geometric considerations, and in particular the notion of barycenters, we extend to this setting the definition and analysis of the classical exponentially weighted average forecaster. We also adapt the principle of online to batch conversion to this setting. We shortly discuss the application of these results in the context of aggregation and for the problem of barycenter estimation.

This paper provides rates of convergence for empirical (generalised) barycenters on compact geodesic metric spaces under general conditions using empirical processes techniques. Our main assumption is termed a variance inequality and provides a strong connection between usual assumptions in the field of empirical processes and central concepts of metric geometry. We study the validity of variance inequalities in spaces of non-positive and non-negative Aleksandrov curvature. In this last scenario, we show that variance inequalities hold provided geodesics, emanating from a barycenter, can be extended by a constant factor. We also relate variance inequalities to strong geodesic convexity. While not restricted to this setting, our results are largely discussed in the context of the 2-Wasserstein space.

Linear two-timescale stochastic approximation (SA) scheme is an important class of algorithms which has become popular in reinforcement learning (RL), particularly for the policy evaluation problem. Recently, a number of works have been devoted to establishing the finite time analysis of the scheme, especially under the Markovian (non-i.i.d.) noise settings that are ubiquitous in practice. In this paper, we provide a finite-time analysis for linear two timescale SA. Our bounds show that there is no discrepancy in the convergence rate between Markovian and martingale noise, only the constants are affected by the mixing time of the Markov chain. With an appropriate step size schedule, the transient term in the expected error bound is $o(1/k^c)$ and the steady-state term is ${\cal O}(1/k)$, where $c>1$ and $k$ is the iteration number. Furthermore, we present an asymptotic expansion of the expected error with a matching lower bound of $\Omega(1/k)$. A simple numerical experiment is presented to support our theory.

In this paper we propose a novel variance reduction approach for additive functionals of Markov chains based on minimization of an estimate for the asymptotic variance of these functionals over suitable classes of control variates. A distinctive feature of the proposed approach is its ability to significantly reduce the overall finite sample variance. This feature is theoretically demonstrated by means of a deep non asymptotic analysis of a variance reduced functional as well as by a thorough simulation study. In particular we apply our method to various MCMC Bayesian estimation problems where it favourably compares to the existing variance reduction approaches.

Let F_n denote the distribution function of the normalized sum of ni.i.d. random variables. In this paper, polynomial rates of approx- imation of Fn by the corrected normal laws are considered in the model where the underlying distribution has a convolution structure. As a basic tool, the convergence part of Khinchine’s theorem in met- ric theory of Diophantine approximations is extended to the class of product characteristic functions.

A new concept of (δ ,L)-model of a function that is a generalization of the Devolder–Glineur–Nesterov (δ ,L)-oracle is proposed. Within this concept, the gradient descent and fast gradient descent methods are constructed and it is shown that constructs of many known methods (composite methods, level methods, conditional gradient and proximal methods) are particular cases of the methods proposed in this paper.

A mixed data frame (MDF) is a table collecting categorical, numerical, and count observations. The use of MDF is widespread in statistics and the applications are numerous from abundance data in ecology to recommender systems. In many cases, an MDF exhibits simultaneously *main effects*, such as row, column, or group effects and *interactions*, for which a low-rank model has often been suggested. Although the literature on low-rank approximations is very substantial, with few exceptions, existing methods do not allow to incorporate main effects and interactions while providing statistical guarantees. The present work fills this gap. We propose an estimation method which allows to recover simultaneously the main effects and the interactions. We show that our method is near optimal under conditions which are met in our targeted applications. We also propose an optimization algorithm which provably converges to an optimal solution. Numerical experiments reveal that our method, mimi, performs well when the main effects are sparse and the interaction matrix has low-rank. We also show that mimi compares favorably to existing methods, in particular when the main effects are significantly large compared to the interactions, and when the proportion of missing entries is large. The method is available as an R package on the Comprehensive R Archive Network. Supplementary materials for this article are available online.

Two transforms of functions on a half-line are considered. It is proved that their composition gives a concave majorant for every non-negative function. In particular, this composition is an identical transform on the class of non-negative functions. Applications of this result in the operator theory of Hilbert space and in the theory of quantum systems are mentioned. Several open problems are formulated.

In this paper we study the problem of pointwise density es- timation from observations with multiplicative measurement errors. We elucidate the main feature of this problem: the influence of the estimation point on the estimation accuracy. In particular, we show that, depending on whether this point is separated away from zero or not, there are two different regimes in terms of the rates of convergence of the minimax risk. In both regimes we develop kernel–type density estimators and prove up- per bounds on their maximal risk over suitable nonparametric classes of densities. We show that the proposed estimators are rate–optimal by establishing matching lower bounds on the minimax risk. Finally we test our estimation procedures on simulated data.

Let X_1, ... ,X_n be i.i.d. sample in R^p with zero mean and the covariance matrix S. The problem of recovering the projector onto the eigenspace of S from these observations naturally arises in many applications. Recent technique from [Koltchinskii and Lounici, 2015b] helps to study the asymptotic distribution of the distance in the Frobenius norm between the true projector P_r on the subspace of the r-th eigenvalue and its empirical counterpart \hat{P}_r in terms of the effective trace of S. This paper offers a bootstrap procedure for building sharp confidence sets for the true projector P_r from the given data. This procedure does not rely on the asymptotic distribution of || P_r - \hat{P}_r ||_2 and its moments, it applies for small or moderate sample size n and large dimension p. The main result states the validity of the proposed procedure for finite samples with an explicit error bound on the error of bootstrap approximation. This bound involves some new sharp results on Gaussian comparison and Gaussian anti-concentration in high dimension. Numeric results confirm a nice performance of the method in realistic examples.

Given discrete time observations over a growing time interval, we consider a nonparametric Bayesian approach to estimation of the Levy density of a Levy process belonging to a flexible class of infinite activity subordinators. Posterior inference is performed via MCMC, and we circumvent the problem of the intractable likelihood via the data augmentation device, that in our case relies on bridge process sampling via Gamma process bridges. Our approach also requires the use of a new infinite-dimensional form of a reversible jump MCMC algorithm. We show that our method leads to good practical results in challenging simulation examples. On the theoretical side, we establish that our nonparametric Bayesian procedure is consistent: in the low frequency data setting, with equispaced in time observations and intervals between successive observations remaining fixed, the posterior asymptotically, as the sample size tends to infinity, concentrates around the Levy density under which the data have been generated. Finally, we test our method on a classical insurance dataset.

We consider symmetric random matrices whose upper triangular entries are independent random variables with zero mean and unit variance. Under the assumption of finite fourth moment it is shown that the fluctuations of the Stieltjes transform *m_**n*(*z*), z = u + i v, v >0, of the empirical spectral distribution function of the matrix about the Stieltjes transform m_{sc}(𝑧) of Wigner’s semicircle law are of order (*nv)^{-1} log n*. An application of the result obtained to the convergence rate in probability of the empirical spectral distribution function to Wigner’s semicircle law in the uniform metric is discussed.

We consider a random symmetric matrix \(\X = [X_{jk}]_{j,k=1}^n\) with upper triangular entries being independent random variables with mean zero and unit variance. Assuming that \( \max_{jk} \E |X_{jk}|^{4+\delta} < \infty, \delta > 0\), it was proved in \cite{GotzeNauTikh2016a} that with high probability the typical distance between the Stieltjes transforms \(m_n(z)\), \(z = u + i v\), of the empirical spectral distribution (ESD) and the Stieltjes transforms \(m_{sc}(z)\) of the semicircle law is of order \((nv)^{-1} \log n\). The aim of this paper is to remove \(\delta>0\) and show that this result still holds if we assume that \( \max_{jk} \E |X_{jk}|^{4} < \infty\). We also discuss applications to the rate of convergence of the ESD to the semicircle law in the Kolmogorov distance, rates of localization of the eigenvalues around the classical positions and rates of delocalization of eigenvectors.

We study the estimation of the covariance matrix Σ of a p-dimensional nor- mal random vector based on n independent observations corrupted by additive noise. Only a general nonparametric assumption is imposed on the distribution of the noise without any sparsity constraint on its covariance matrix. In this high-dimensional semiparametric deconvolution problem, we propose spectral thresholding estimators that are adaptive to the sparsity of Σ. We establish an oracle inequality for these estimators under model miss- specification and derive non-asymptotic minimax convergence rates that are shown to be logarithmic in log p/n. We also discuss the estimation of low-rank matrices based on indi- rect observations as well as the generalization to elliptical distributions. The finite sample performance of the threshold estimators is illustrated in a numerical example.

In this paper we study the problem of statistical inference for a continuous-time moving average L\'evy process of the form

\[ Z_{t}=\int_{\R}\mathcal{K}(t-s)\, dL_{s},\quad t\in\mathbb{R}, \] with a deterministic kernel \(\K\) and a L{\'e}vy process \(L\). Especially the estimation of the L\'evy measure \(\nu\) of $L$ from low-frequency observations of the process $Z$ is considered. We construct a consistent estimator, derive its convergence rates and illustrate its performance by a numerical example. On the mathematical level, we establish some new results on exponential mixing for continuous-time moving average L\'evy processes.

In this paper, we give sufficient conditions guaranteeing the validity of the well-known minimax theorem for the lower Snell envelope. Such minimax results play an important role in the characterisation of arbitrage-free prices of American contingent claims in incomplete markets. Our conditions do not rely on the notions of stability under pasting or time-consistency and reveal some unexpected connection between the minimax result and path properties of the corresponding process of densities. We exemplify our general results in the case of families of measures corresponding to diffusion exponential martingales.

For Monte Carlo estimators, a variance reduction method based on empirical variance minimization in the class of functions with zero mean (control functions) is described. An upper bound for the efficiency of the method is obtained in terms of the properties of the functional class.

In this paper we suggest a modification of the regression-based variance reduction approach recently proposed in Belomestny et al. This modification is based on the stratification technique and allows for a further significant variance reduction. The performance of the proposed approach is illustrated by several numerical examples.

We derive tight non-asymptotic bounds for the Kolmogorov distance between the probabilities of two Gaussian elements to hit a ball in a Hilbert space. The key property of these bounds is that they are dimension-free and depend on the nuclear (Schatten-one) norm of the difference between the covariance operators of the elements and on the norm of the mean shift. The obtained bounds significantly improve the bound based on Pinsker's inequality via the Kullback-Leibler divergence. We also establish an anti-concentration bound for a squared norm of a non-centered Gaussian element in Hilbert space. The paper presents a number of examples motivating our results and applications of the obtained bounds to statistical inference and to high-dimensional CLT.

Under correlation-type conditions, we derive upper bounds of order 1/\sqrt{n} for the Kolmogorov distance between the distributions of weighted sums of dependent summands and the normal law.

We study sharpened forms of the concentration of measure phenomenon typically centered at stochastic expansions of order d-1 for any d \in N. The bounds are based on dth order derivatives or difference operators. In particular, we consider deviations of functions of independent random variables and differentiable functions over probability measures satisfying a logarithmic Sobolev inequality, and functions on the unit sphere. Applications include concentration inequalities for U-statistics as well as for classes of symmetric functions via polynomial approximations on the sphere (Edgeworth-type expansions).

Given a convex body K⊂RnK⊂Rn with the barycenter at the origin, we consider the corresponding Kähler–Einstein equation e−Φ=detD2Φe−Φ=detD2Φ. If *K* is a simplex, then the Ricci tensor of the Hessian metric D2ΦD2Φ is constant and equals n−14(n+1)n−14(n+1). We conjecture that the Ricci tensor of D2ΦD2Φfor an arbitrary convex body K⊆RnK⊆Rn is uniformly bounded from above by n−14(n+1)n−14(n+1) and we verify this conjecture in the two-dimensional case. The general case remains open.

We analyze two algorithms for approximating the general optimal transport (OT) distance between two discrete distributions of size $n$, up to accuracy $\varepsilon$. For the first algorithm, which is based on the celebrated Sinkhorn’s algorithm, we prove the complexity bound $\widetilde{O}\left(\frac{n^2}{\varepsilon^2}\right)$ arithmetic operations ($\widetilde{O}$ hides polylogarithmic factors $(\ln n)^c$, $c>0$). For the second one, which is based on our novel Adaptive Primal-Dual Accelerated Gradient Descent (APDAGD) algorithm, we prove the complexity bound $\widetilde{O}\left(\min\left\{\frac{n^{9/4}}{\varepsilon}, \frac{n^{2}}{\varepsilon^2} \right\}\right)$ arithmetic operations. Both bounds have better dependence on $\varepsilon$ than the state-of-the-art result given by $\widetilde{O}\left(\frac{n^2}{\varepsilon^3}\right)$. Our second algorithm not only has better dependence on $\varepsilon$ in the complexity bound, but also is not specific to entropic regularization and can solve the OT problem with different regularizers.

A sample X_1,...,X_n consisting of шndependent identically distributed vectors in Rp with zero mean and a covariance matrix \Sigma is considered. The recovery of spectral projectors of high-dimensional covariance matrices from a sample of observations is a key problem in statistics arising in numerous applications. In their 2015 work, V.Koltchinskii and K.Lounici obtained non-asymptotic bounds for the Frobenius norm \|\hat P_r − P_r \|_2^2 of the distance between sample and true projectors and studied asymptotic behavior for large samples. More specifically, asymptotic confidence sets for the true projector \P_r were constructed assuming that the moment characteristics of the observations are known. This paper describes a bootstrap procedure for constructing confidence sets for the spectral projector \P_r of the covariance matrix \Sigmna from given data. This approach does not use the asymptotical distribution of \|\hat P_r − P_r \|_2^2 and does not require the computation of its moment characteristics. The performance of the bootstrap approximation procedure is analyzed.

Upper bounds for the closeness of two centered Gaussian measures in the class of balls in a sepa- rable Hilbert space are obtained. The bounds are optimal with respect to the dependence on the spectra of the covariance operators of the Gaussian measures. The inequalities cannot be improved in the general case.

Let X_1,…,X_n be an i.i.d. sample in R^p with zero mean and the covariance matrix \Sigma^{*}. The classical PCA approach recovers the projector \P_J^{*} onto the principal eigenspace of \Sigma^{*} by its empirical counterpart \hat \P_J. Recent paper [24] investigated the asymptotic distribution of the Frobenius distance between the projectors \|\hat \P_J - \P_J^{*}\|_2, while [27] offered a bootstrap procedure to measure uncertainty in recovering this subspace \P_J^{*} even in a finite sample setup. The present paper considers this problem from a Bayesian perspective and suggests to use the credible sets of the pseudo-posterior distribution on the space of covariance matrices induced by the conjugated Inverse Wishart prior as sharp confidence sets. This yields a numerically efficient procedure. Moreover, we theoretically justify this method and derive finite sample bounds on the corresponding coverage probability. Contrary to [24, 27], the obtained results are valid for non-Gaussian data: the main assumption that we impose is the concentration of the sample covariance \hat \Sigma in a vicinity of \Sigma^{*}. Numerical simulations illustrate good performance of the proposed procedure even on non-Gaussian data in a rather challenging regime.

In this paper we propose a novel dual regression-based approach for pricing American options. This approach reduces the complexity of the nested Monte Carlo method and has especially simple form for time discretized diffusion processes. We analyse the complexity of the proposed approach both in the case of fixed and increasing number of exercise dates. The method is illustrated by several numerical examples.

In this paper we suggest a modification of the regression-based variance reduction approach recently proposed in Belomestny et al. This modification is based on the stratification technique and allows for a further significant variance reduction. The performance of the proposed approach is illustrated by several numerical examples.

We consider a new method of semiparametric statistical estimation for the continuous-time moving-average Lévy processes. We derive the convergence rates of the proposed estimators and show that these rates are optimal in minimax sense.

This is an advanced guide to optimal stopping and control, focusing on advanced Monte Carlo simulation and its application to finance. Written for quantitative finance practitioners and researchers in academia, the book looks at the classical simulation based algorithms before introducing some of the new, cutting edge approaches under development.

In this paper we study the problem of statistical inference on the parameters of the semiparametric variance-mean mixtures. This class of mixtures has recently become rather popular in statistical and financial modelling. We design a semiparametric estimation procedure that first estimates the mean of the underlying normal distribution and then recovers nonparametrically the density of the corresponding mixing distribution. We illustrate the performance of our procedure on simulated and real data.

The estimation of the diffusion matrix Σ of a high-dimensional, possibly time-changed Levy process is studied, based on discrete observations of the process with a fixed distance. A low-rank condition is imposed on Σ. Applying a spectral approach, we construct a weighted least-squares estimator with nuclear-norm-penalisation. We prove oracle inequalities and derive convergence rates for the diffusion matrix estimator. The convergence rates show a surprising dependency on the rank of Σ and are optimal in the minimax sense for fixed dimensions. Theoretical results are illustrated by a simulation study.