# Publications

Linear two-timescale stochastic approximation (SA) scheme is an important class of algorithms which has become popular in reinforcement learning (RL), particularly for the policy evaluation problem. Recently, a number of works have been devoted to establishing the finite time analysis of the scheme, especially under the Markovian (non-i.i.d.) noise settings that are ubiquitous in practice. In this paper, we provide a finite-time analysis for linear two timescale SA. Our bounds show that there is no discrepancy in the convergence rate between Markovian and martingale noise, only the constants are affected by the mixing time of the Markov chain. With an appropriate step size schedule, the transient term in the expected error bound is $o(1/k^c)$ and the steady-state term is ${\cal O}(1/k)$, where $c>1$ and $k$ is the iteration number. Furthermore, we present an asymptotic expansion of the expected error with a matching lower bound of $\Omega(1/k)$. A simple numerical experiment is presented to support our theory.

We consider products of independent \(n \times n\) non-Hermitian random matrices \(\X^{(1)}, \ldots, \X^{(m)}\). Assume that their entries, \(X_{jk}^{(q)}, 1 \le j,k \le n, q = 1, \ldots, m\), are independent identically distributed random variables with zero mean, unit variance. G\"otze -- Tikhomirov~\cite{GotTikh2011} and O'Rourke--Sochnikov~\cite{Soshnikov2011} proved that under these assumptions the empirical spectral distribution (ESD) of \(X^{(1)} \cdots X^{(m)}\) converges to the limiting distribution which coincides with the distribution of the \(m\)-th power of random variable uniformly distributed in the unit circle. In the current paper we provide a local vesion of this result. More precisely, assuming additionally that \(\E |X_{11}^{(q)}|^{4+\delta} < \infty\) for some \(\delta > 0\), we prove that ESD of \(X^{(1)} \cdots X^{(m)}\) converges to the limiting distribution on the optimal scale up to \(n^{-1+2a}, 0 < a < 1/2\) (up to some logarithmic factor). Our results generalize the recent results of Bourgade--Yau--Yin~\cite{Bourgade2014a}, Tao--Vu~\cite{TaoVu2015a} and Nemish~\cite{nemish2017}. We also give further development of Stein's type approach to estimate the the Stieltjes transform of ESD.

We consider high-dimension low-sample-size data taken from the standard multivariate normal distribution under assumption that dimension is a random variable. The second order Chebyshev–Edgeworth expansions for distributions of an angle between two sample observations and corresponding sample correlation coefficient are constructed with error bounds. Depending on the type of normalization, we get three different limit distributions: Normal, Student’s t-, or Laplace distributions. The paper continues studies of the authors on approximation of statistics for random size samples.

This book presents recent non-asymptotic results for approximations in multivariate statistical analysis. The book is unique in its focus on results with the correct error structure for all the parameters involved. Firstly, it discusses the computable error bounds on correlation coefficients, MANOVA tests and discriminant functions studied in recent papers. It then introduces new areas of research in high-dimensional approximations for bootstrap procedures, Cornish–Fisher expansions, power-divergence statistics and approximations of statistics based on observations with random sample size. Lastly, it proposes a general approach for the construction of non-asymptotic bounds, providing relevant examples for several complicated statistics. It is a valuable resource for researchers with a basic understanding of multivariate statistics.

In this paper, we propose a Weighted Stochastic Mesh (WSM) algorithm for approximating the value of discrete- and continuous-time optimal stopping problems. In this context, we consider tractability of such problems via a useful notion of semitractability and the introduction of a tractability index for a particular numerical solution algorithm. It is shown that in the discrete-time case the WSM algorithm leads to semitractability of the corresponding optimal stopping problem in the sense that its complexity is bounded in order by 𝜀^4 log^{𝑑+2}(1∕𝜀) with 𝑑 being the dimension of the underlying Markov chain. Furthermore, we study the WSM approach in the context of continuous- time optimal stopping problems and derive the correspond- ing complexity bounds. Although we cannot prove semi- tractability in this case, our bounds turn out to be the tightest ones among the complexity bounds known in the literature. We illustrate our theoretical findings by a numerical example.

We develop a matrix approach to the Maximal Acyclic Subgraph (MAS) problem by reducing it to finding the closest nilpotent matrix to the matrix of the graph. Using recent results on the closest Schur stable systems and on minimising the spectral radius over special sets of non-negative matrices we obtain an algorithm for finding an approximate solution of MAS. Numerical results for graphs from $50$ to $1500$ vertices demonstrate its fast convergence and give the rate of approximation in most cases larger than $0.6$. The same method gives the precise solution for the following weakened version of MAS: find the minimal $r$ such that the graph can be made acyclic by cutting at most~$r$ incoming edges from each vertex. Several modifications, when each vertex is assigned with its own maximal number~$r_i$ of cut edges, when some of edges are ``untouchable'', are also considered. Some applications are discussed.

In this paper we propose a novel variance reduction approach for additive functionals of Markov chains based on minimization of an estimate for the asymptotic variance of these functionals over suitable classes of control variates. A distinctive feature of the proposed approach is its ability to significantly reduce the overall finite sample variance. This feature is theoretically demonstrated by means of a deep non asymptotic analysis of a variance reduced functional as well as by a thorough simulation study. In particular we apply our method to various MCMC Bayesian estimation problems where it favourably compares to the existing variance reduction approaches.

In this paper we study the problem of pointwise density es- timation from observations with multiplicative measurement errors. We elucidate the main feature of this problem: the influence of the estimation point on the estimation accuracy. In particular, we show that, depending on whether this point is separated away from zero or not, there are two different regimes in terms of the rates of convergence of the minimax risk. In both regimes we develop kernel–type density estimators and prove up- per bounds on their maximal risk over suitable nonparametric classes of densities. We show that the proposed estimators are rate–optimal by establishing matching lower bounds on the minimax risk. Finally we test our estimation procedures on simulated data.

We study linear rough partial differential equations in the setting of [Friz and Hairer, Springer, 2014, Chapter 12]. More precisely, we consider a linear parabolic partial differential equation driven by a deterministic rough path W of Hölder regularity α with 1/3 < α ≤ 1/2. Based on a stochastic representation of the solution of the rough partial differential equation, we propose a regression Monte Carlo algorithm for spatio-temporal approximation of the solution. We provide a full convergence analysis of the proposed approximation method which essentially relies on the new bounds for the higher order derivatives of the solution in space. Finally, we present a simulation study showing the applicability of the proposed algorithm.

We consider a problem of multiclass classification, where the training sample Sn={(Xi,Yi)}ni=1 is generated from the model ℙ(Y=m|X=x)=ηm(x), 1≤m≤M, and η1(x),…,ηM(x) are unknown α-Holder continuous functions.Given a test point X, our goal is to predict its label. A widely used 𝗄-nearest-neighbors classifier constructs estimates of η1(X),…,ηM(X) and uses a plug-in rule for the prediction. However, it requires a proper choice of the smoothing parameter 𝗄, which may become tricky in some situations. In our solution, we fix several integers n1,…,nK, compute corresponding nk-nearest-neighbor estimates for each m and each nk and apply an aggregation procedure. We study an algorithm, which constructs a convex combination of these estimates such that the aggregated estimate behaves approximately as well as an oracle choice. We also provide a non-asymptotic analysis of the procedure, prove its adaptation to the unknown smoothness parameter α and to the margin and establish rates of convergence under mild assumptions.

In this paper, we propose new first-order methods for minimization of a convex function on a simple convex set. We assume that the objective function is a composite function given as a sum of a simple convex function and a convex function with inexact Hölder-continuous subgradient. We propose Universal Intermediate Gradient Method. Our method enjoys both the universality and intermediateness properties. Following the ideas of Nesterov (Math. Program. 152 (2015), pp. 381–404) on Universal Gradient Methods, our method does not require any information about the Hölder parameter and constant and adjusts itself automatically to the local level of smoothness. On the other hand, in the spirit of the Intermediate Gradient Method proposed by Devolder et al. (CORE Discussion Paper 2013/17, 2013), our method is intermediate in the sense that it interpolates between Universal Gradient Method and Universal Fast Gradient Method. This allows to balance the rate of convergence of the method and rate of the oracle error accumulation. Under the additional assumption of strong convexity of the objective, we show how the restart technique can be used to obtain an algorithm with faster rate of convergence.

We consider rates of approximation of distributions of weighted sums of independent, identically distributed random variables by the Edgeworth correction of the 4-th order.

In this paper we study optimal stopping problems for nonlinear Markov processes driven by a McKean-Vlasov SDE and aim at solving them numerically by Monte Carlo. To this end we propose a novel regression algorithm based on the corresponding particle system and prove its convergence. The proof of convergence is based on perturbation analysis of a related linear regression problem. The performance of the proposed algorithms is illustrated by a numerical example.

This paper is devoted to uniform versions of the Hanson-Wright inequality for a random vector X∈RnX∈Rn with independent subgaussian components. The core technique of the paper is based on the entropy method combined with truncations of both gradients of functions of interest and of the components of XX itself. Our results recover, in particular, the classic uniform bound of Talagrand [28] for Rademacher chaoses and the more recent uniform result of Adamczak [2] which holds under certain rather strong assumptions on the distribution of XX. We provide several applications of our techniques: we establish a version of the standard Hanson-Wright inequality, which is tighter in some regimes. Extending our results we show a version of the dimension-free matrix Bernstein inequality that holds for random matrices with a subexponential spectral norm. We apply the derived inequality to the problem of covariance estimation with missing observations and prove an almost optimal high probability version of the recent result of Lounici [21]. Finally, we show a uniform Hanson-Wright-type inequality in the Ising model under Dobrushin’s condition. A closely related question was posed by Marton [22].

In this paper we propose a novel and practical variance reduction approach for additive functionals of dependent sequences. Our approach combines the use of control variates with the minimisation of an empirical variance estimate. We analyse finite sample properties of the proposed method and derive finite-time bounds of the excess asymptotic variance to zero. We apply our methodology to Stochastic Gradient MCMC (SGMCMC) methods for Bayesian inference on large data sets and combine it with existing variance reduction methods for SGMCMC. We present empirical results carried out on a number of benchmark examples showing that our variance reduction method achieves significant improvement as compared to state-of-the-art methods at the expense of a moderate increase of computational overhead.

In this paper, a new variant of accelerated gradient descent is proposed. The proposed method does not require any information about the objective function, uses exact line search for the practical accelerations of convergence, converges according to the well-known lower bounds for both convex and non-convex objective functions, possesses primal–dual properties and can be applied in the non-euclidian set-up. As far as we know this is the first such method possessing all of the above properties at the same time. We also present a universal version of the method which is applicable to non-smooth problems. We demonstrate how in practice one can efficiently use the combination of line-search and primal-duality by considering a convex optimization problem with a simple structure (for example, linearly constrained)

A mixed data frame (MDF) is a table collecting categorical, numerical, and count observations. The use of MDF is widespread in statistics and the applications are numerous from abundance data in ecology to recommender systems. In many cases, an MDF exhibits simultaneously *main effects*, such as row, column, or group effects and *interactions*, for which a low-rank model has often been suggested. Although the literature on low-rank approximations is very substantial, with few exceptions, existing methods do not allow to incorporate main effects and interactions while providing statistical guarantees. The present work fills this gap. We propose an estimation method which allows to recover simultaneously the main effects and the interactions. We show that our method is near optimal under conditions which are met in our targeted applications. We also propose an optimization algorithm which provably converges to an optimal solution. Numerical experiments reveal that our method, mimi, performs well when the main effects are sparse and the interaction matrix has low-rank. We also show that mimi compares favorably to existing methods, in particular when the main effects are significantly large compared to the interactions, and when the proportion of missing entries is large. The method is available as an R package on the Comprehensive R Archive Network. Supplementary materials for this article are available online.

The problem of nding the closest stable matrix for a dynamical system has many

applications. It is studied for both continuous and discrete-time systems and the corresponding

optimization problems are formulated for various matrix norms. As a rule, nonconvexity of these

formulations does not allow nding their global solutions. In this paper, we analyze positive discretetime

systems. They also suer from nonconvexity of the stability region, and the problem in the

Frobenius norm or in the Euclidean norm remains hard for them. However, it turns out that for

certain polyhedral norms, the situation is much better. We show that for the distances measured in

the max-norm, we can nd an exact solution of the corresponding nonconvex projection problems

in polynomial time. For the distance measured in the operator `1-norm or `1-norm, the exact

solution is also eciently found. To this end, we develop a modication of the recently introduced

spectral simplex method. On the other hand, for all these three norms, we obtain exact descriptions

of the region of stability around a given stable matrix. In the case of the max-norm, this can be

seen as an extension onto the class of nonnegative matrices, the Kharitonov theorem, providing a

stability criterion for polynomials with interval coecients [V. L Kharitonov, Dier. Uravn., 14

(1978), pp. 2086{2088; K. Panneerselvam and R. Ayyagari, Internat. J. Control Sci. Engrg., 3

(2013), pp. 81{85]. For practical implementation of our technique, we developed a new method for

approximating the maximal eigenvalue of a nonnegative matrix. It combines the local quadratic rate

of convergence with polynomial-time global performance guarantees.

In this paper, we consider resource allocation problem stated as a convex minimization problem with linear constraints. To solve this problem, we use gradient and accelerated gradient descent applied to the dual problem and prove the convergence rate both for the primal iterates and the dual iterates. We obtain faster convergence rates than the ones known in the literature. We also provide economic interpretation for these two methods. This means that iterations of the algorithms naturally correspond to the process of price and production adjustment in order to obtain the desired production volume in the economy. Overall, we show how these actions of the economic agents lead the whole system to the equilibrium.

We study the transportation problem on the unit sphere Sn−1 for symmetric probability measures and the cost function c(x,y)=log1〈x,y〉. We calculate the variation of the corresponding Kantorovich functional K and study a naturally associated metric-measure space on Sn−1 endowed with a Riemannian metric generated by the corresponding transportational potential. We introduce a new transportational functional which minimizers are solutions to the symmetric log-Minkowski problem and prove that K satisfies the following analog of the Gaussian transportation inequality for the uniform probability measure σ on Sn−1: 1nEnt(ν)≥K(σ,ν). It is shown that there exists a remarkable similarity between our results and the theory of the K{ä}hler-Einstein equation on Euclidean space. As a by-product we obtain a new proof of uniqueness of solution to the log-Minkowski problem for the uniform measure.

In this paper we study the problem

of density deconvolution under general assumptions on the measurement error distribution. Typically

deconvolution estimators are constructed using Fourier transform techniques, and

it is assumed that

the characteristic function of

the measurement errors does not have zeros

on the real line. This assumption is rather strong and is not fulfilled

in many cases of interest. In this paper we develop a

methodology for constructing optimal density deconvolution estimators in the general setting that covers

vanishing and non--vanishing characteristic functions of the measurement errors.

We derive upper bounds on the risk of the proposed estimators and

provide sufficient conditions under which zeros of the corresponding characteristic function have no effect on estimation accuracy.

Moreover, we show that the derived conditions are also necessary in some

specific problem instances.

This paper provides rates of convergence for empirical (generalised) barycenters on compact geodesic metric spaces under general conditions using empirical processes techniques. Our main assumption is termed a variance inequality and provides a strong connection between usual assumptions in the field of empirical processes and central concepts of metric geometry. We study the validity of variance inequalities in spaces of non-positive and non-negative Aleksandrov curvature. In this last scenario, we show that variance inequalities hold provided geodesics, emanating from a barycenter, can be extended by a constant factor. We also relate variance inequalities to strong geodesic convexity. While not restricted to this setting, our results are largely discussed in the context of the 2-Wasserstein space.

This work establishes fast rates of convergence for empirical barycenters over a large class of geodesic spaces with curvature bounds in the sense of Alexandrov. More specifically, we show that parametric rates of convergence are achievable under natural conditions that characterize the bi-extendibility of geodesics emanating from a barycenter. These results largely advance the state-of-the-art on the subject both in terms of rates of convergence and the variety of spaces covered. In particular, our results apply to infinite-dimensional spaces such as the 2-Wasserstein space, where bi-extendibility of geodesics translates into regularity of Kantorovich potentials.

This paper addresses the problem of prediction with expert advice for outcomes in a geodesic space with non-positive curvature in the sense of Alexandrov. Via geometric considerations, and in particular the notion of barycenters, we extend to this setting the definition and analysis of the classical exponentially weighted average forecaster. We also adapt the principle of online to batch conversion to this setting. We shortly discuss the application of these results in the context of aggregation and for the problem of barycenter estimation.

This paper provides rates of convergence for empirical (generalised) barycenters on compact geodesic metric spaces under general conditions using empirical processes techniques. Our main assumption is termed a variance inequality and provides a strong connection between usual assumptions in the field of empirical processes and central concepts of metric geometry. We study the validity of variance inequalities in spaces of non-positive and non-negative Aleksandrov curvature. In this last scenario, we show that variance inequalities hold provided geodesics, emanating from a barycenter, can be extended by a constant factor. We also relate variance inequalities to strong geodesic convexity. While not restricted to this setting, our results are largely discussed in the context of the 2-Wasserstein space.

In this paper, we study the conjecture of Gardner and Zvavitch from \cite{GZ}, which suggests that the standard Gaussian measure γ enjoys 1n-concavity with respect to the Minkowski addition of \textbf{symmetric} convex sets. We prove this fact up to a factor of 2: that is, we show that for symmetric convex K and L,

γ(λK+(1−λ)L)12n≥λγ(K)12n+(1−λ)γ(L)12n.

Further, we show that under suitable dimension-free uniform bounds on the Hessian of the potential, the log-concavity of even measures can be strengthened to p-concavity, with p>0, with respect to the addition of symmetric convex sets.

The multistochastic (*n*, *k*)-Monge–Kantorovich problem on a product space ∏ni=1Xi∏i=1nXi is an extension of the classical Monge–Kantorovich problem. This problem is considered on the space of measures with fixed projections onto Xi1×⋯×XikXi1×⋯×Xik for all *k*-tuples {i1,…,ik}⊂{1,…,n}{i1,…,ik}⊂{1,…,n} for a given 1≤k<n1≤k<n. In our paper we study well-posedness of the primal and the corresponding dual problem. Our central result describes a solution ππ to the following important model case: n=3,k=2,Xi=[0,1]n=3,k=2,Xi=[0,1], the cost function c(x,y,z)=xyzc(x,y,z)=xyz, and the corresponding two-dimensional projections are Lebesgue measures on [0,1]2[0,1]2. We prove, in particular, that the mapping (x,y)→x⊕y(x,y)→x⊕y, where ⊕⊕ is the bitwise addition (xor- or Nim-addition) on [0,1]≅Z∞2[0,1]≅Z2∞, is the corresponding optimal transportation. In particular, the support of ππ is the Sierpiński tetrahedron. In addition, we describe a solution to the corresponding dual problem.

Given a convex body K⊂RnK⊂Rn with the barycenter at the origin, we consider the corresponding Kähler–Einstein equation e−Φ=detD2Φe−Φ=detD2Φ. If *K* is a simplex, then the Ricci tensor of the Hessian metric D2ΦD2Φ is constant and equals n−14(n+1)n−14(n+1). We conjecture that the Ricci tensor of D2ΦD2Φfor an arbitrary convex body K⊆RnK⊆Rn is uniformly bounded from above by n−14(n+1)n−14(n+1) and we verify this conjecture in the two-dimensional case. The general case remains open.

Let F_n denote the distribution function of the normalized sum of ni.i.d. random variables. In this paper, polynomial rates of approx- imation of Fn by the corrected normal laws are considered in the model where the underlying distribution has a convolution structure. As a basic tool, the convergence part of Khinchine’s theorem in met- ric theory of Diophantine approximations is extended to the class of product characteristic functions.

A new concept of (δ ,L)-model of a function that is a generalization of the Devolder–Glineur–Nesterov (δ ,L)-oracle is proposed. Within this concept, the gradient descent and fast gradient descent methods are constructed and it is shown that constructs of many known methods (composite methods, level methods, conditional gradient and proximal methods) are particular cases of the methods proposed in this paper.

A mixed data frame (MDF) is a table collecting categorical, numerical, and count observations. The use of MDF is widespread in statistics and the applications are numerous from abundance data in ecology to recommender systems. In many cases, an MDF exhibits simultaneously *main effects*, such as row, column, or group effects and *interactions*, for which a low-rank model has often been suggested. Although the literature on low-rank approximations is very substantial, with few exceptions, existing methods do not allow to incorporate main effects and interactions while providing statistical guarantees. The present work fills this gap. We propose an estimation method which allows to recover simultaneously the main effects and the interactions. We show that our method is near optimal under conditions which are met in our targeted applications. We also propose an optimization algorithm which provably converges to an optimal solution. Numerical experiments reveal that our method, mimi, performs well when the main effects are sparse and the interaction matrix has low-rank. We also show that mimi compares favorably to existing methods, in particular when the main effects are significantly large compared to the interactions, and when the proportion of missing entries is large. The method is available as an R package on the Comprehensive R Archive Network. Supplementary materials for this article are available online.

Two transforms of functions on a half-line are considered. It is proved that their composition gives a concave majorant for every non-negative function. In particular, this composition is an identical transform on the class of non-negative functions. Applications of this result in the operator theory of Hilbert space and in the theory of quantum systems are mentioned. Several open problems are formulated.

Let X_1, ... ,X_n be i.i.d. sample in R^p with zero mean and the covariance matrix S. The problem of recovering the projector onto the eigenspace of S from these observations naturally arises in many applications. Recent technique from [Koltchinskii and Lounici, 2015b] helps to study the asymptotic distribution of the distance in the Frobenius norm between the true projector P_r on the subspace of the r-th eigenvalue and its empirical counterpart \hat{P}_r in terms of the effective trace of S. This paper offers a bootstrap procedure for building sharp confidence sets for the true projector P_r from the given data. This procedure does not rely on the asymptotic distribution of || P_r - \hat{P}_r ||_2 and its moments, it applies for small or moderate sample size n and large dimension p. The main result states the validity of the proposed procedure for finite samples with an explicit error bound on the error of bootstrap approximation. This bound involves some new sharp results on Gaussian comparison and Gaussian anti-concentration in high dimension. Numeric results confirm a nice performance of the method in realistic examples.

Given discrete time observations over a growing time interval, we consider a nonparametric Bayesian approach to estimation of the Levy density of a Levy process belonging to a flexible class of infinite activity subordinators. Posterior inference is performed via MCMC, and we circumvent the problem of the intractable likelihood via the data augmentation device, that in our case relies on bridge process sampling via Gamma process bridges. Our approach also requires the use of a new infinite-dimensional form of a reversible jump MCMC algorithm. We show that our method leads to good practical results in challenging simulation examples. On the theoretical side, we establish that our nonparametric Bayesian procedure is consistent: in the low frequency data setting, with equispaced in time observations and intervals between successive observations remaining fixed, the posterior asymptotically, as the sample size tends to infinity, concentrates around the Levy density under which the data have been generated. Finally, we test our method on a classical insurance dataset.

We consider symmetric random matrices whose upper triangular entries are independent random variables with zero mean and unit variance. Under the assumption of finite fourth moment it is shown that the fluctuations of the Stieltjes transform *m_**n*(*z*), z = u + i v, v >0, of the empirical spectral distribution function of the matrix about the Stieltjes transform m_{sc}(𝑧) of Wigner’s semicircle law are of order (*nv)^{-1} log n*. An application of the result obtained to the convergence rate in probability of the empirical spectral distribution function to Wigner’s semicircle law in the uniform metric is discussed.

We consider a random symmetric matrix \(\X = [X_{jk}]_{j,k=1}^n\) with upper triangular entries being independent random variables with mean zero and unit variance. Assuming that \( \max_{jk} \E |X_{jk}|^{4+\delta} < \infty, \delta > 0\), it was proved in \cite{GotzeNauTikh2016a} that with high probability the typical distance between the Stieltjes transforms \(m_n(z)\), \(z = u + i v\), of the empirical spectral distribution (ESD) and the Stieltjes transforms \(m_{sc}(z)\) of the semicircle law is of order \((nv)^{-1} \log n\). The aim of this paper is to remove \(\delta>0\) and show that this result still holds if we assume that \( \max_{jk} \E |X_{jk}|^{4} < \infty\). We also discuss applications to the rate of convergence of the ESD to the semicircle law in the Kolmogorov distance, rates of localization of the eigenvalues around the classical positions and rates of delocalization of eigenvectors.

We study the estimation of the covariance matrix Σ of a p-dimensional nor- mal random vector based on n independent observations corrupted by additive noise. Only a general nonparametric assumption is imposed on the distribution of the noise without any sparsity constraint on its covariance matrix. In this high-dimensional semiparametric deconvolution problem, we propose spectral thresholding estimators that are adaptive to the sparsity of Σ. We establish an oracle inequality for these estimators under model miss- specification and derive non-asymptotic minimax convergence rates that are shown to be logarithmic in log p/n. We also discuss the estimation of low-rank matrices based on indi- rect observations as well as the generalization to elliptical distributions. The finite sample performance of the threshold estimators is illustrated in a numerical example.

In this paper we study the problem of statistical inference for a continuous-time moving average L\'evy process of the form

\[ Z_{t}=\int_{\R}\mathcal{K}(t-s)\, dL_{s},\quad t\in\mathbb{R}, \] with a deterministic kernel \(\K\) and a L{\'e}vy process \(L\). Especially the estimation of the L\'evy measure \(\nu\) of $L$ from low-frequency observations of the process $Z$ is considered. We construct a consistent estimator, derive its convergence rates and illustrate its performance by a numerical example. On the mathematical level, we establish some new results on exponential mixing for continuous-time moving average L\'evy processes.

In this paper, we give sufficient conditions guaranteeing the validity of the well-known minimax theorem for the lower Snell envelope. Such minimax results play an important role in the characterisation of arbitrage-free prices of American contingent claims in incomplete markets. Our conditions do not rely on the notions of stability under pasting or time-consistency and reveal some unexpected connection between the minimax result and path properties of the corresponding process of densities. We exemplify our general results in the case of families of measures corresponding to diffusion exponential martingales.

For Monte Carlo estimators, a variance reduction method based on empirical variance minimization in the class of functions with zero mean (control functions) is described. An upper bound for the efficiency of the method is obtained in terms of the properties of the functional class.

In this paper we suggest a modification of the regression-based variance reduction approach recently proposed in Belomestny et al. This modification is based on the stratification technique and allows for a further significant variance reduction. The performance of the proposed approach is illustrated by several numerical examples.

We derive tight non-asymptotic bounds for the Kolmogorov distance between the probabilities of two Gaussian elements to hit a ball in a Hilbert space. The key property of these bounds is that they are dimension-free and depend on the nuclear (Schatten-one) norm of the difference between the covariance operators of the elements and on the norm of the mean shift. The obtained bounds significantly improve the bound based on Pinsker's inequality via the Kullback-Leibler divergence. We also establish an anti-concentration bound for a squared norm of a non-centered Gaussian element in Hilbert space. The paper presents a number of examples motivating our results and applications of the obtained bounds to statistical inference and to high-dimensional CLT.

Under correlation-type conditions, we derive upper bounds of order 1/\sqrt{n} for the Kolmogorov distance between the distributions of weighted sums of dependent summands and the normal law.

We study sharpened forms of the concentration of measure phenomenon typically centered at stochastic expansions of order d-1 for any d \in N. The bounds are based on dth order derivatives or difference operators. In particular, we consider deviations of functions of independent random variables and differentiable functions over probability measures satisfying a logarithmic Sobolev inequality, and functions on the unit sphere. Applications include concentration inequalities for U-statistics as well as for classes of symmetric functions via polynomial approximations on the sphere (Edgeworth-type expansions).

We analyze two algorithms for approximating the general optimal transport (OT) distance between two discrete distributions of size $n$, up to accuracy $\varepsilon$. For the first algorithm, which is based on the celebrated Sinkhorn’s algorithm, we prove the complexity bound $\widetilde{O}\left(\frac{n^2}{\varepsilon^2}\right)$ arithmetic operations ($\widetilde{O}$ hides polylogarithmic factors $(\ln n)^c$, $c>0$). For the second one, which is based on our novel Adaptive Primal-Dual Accelerated Gradient Descent (APDAGD) algorithm, we prove the complexity bound $\widetilde{O}\left(\min\left\{\frac{n^{9/4}}{\varepsilon}, \frac{n^{2}}{\varepsilon^2} \right\}\right)$ arithmetic operations. Both bounds have better dependence on $\varepsilon$ than the state-of-the-art result given by $\widetilde{O}\left(\frac{n^2}{\varepsilon^3}\right)$. Our second algorithm not only has better dependence on $\varepsilon$ in the complexity bound, but also is not specific to entropic regularization and can solve the OT problem with different regularizers.

A sample X_1,...,X_n consisting of шndependent identically distributed vectors in Rp with zero mean and a covariance matrix \Sigma is considered. The recovery of spectral projectors of high-dimensional covariance matrices from a sample of observations is a key problem in statistics arising in numerous applications. In their 2015 work, V.Koltchinskii and K.Lounici obtained non-asymptotic bounds for the Frobenius norm \|\hat P_r − P_r \|_2^2 of the distance between sample and true projectors and studied asymptotic behavior for large samples. More specifically, asymptotic confidence sets for the true projector \P_r were constructed assuming that the moment characteristics of the observations are known. This paper describes a bootstrap procedure for constructing confidence sets for the spectral projector \P_r of the covariance matrix \Sigmna from given data. This approach does not use the asymptotical distribution of \|\hat P_r − P_r \|_2^2 and does not require the computation of its moment characteristics. The performance of the bootstrap approximation procedure is analyzed.

Upper bounds for the closeness of two centered Gaussian measures in the class of balls in a sepa- rable Hilbert space are obtained. The bounds are optimal with respect to the dependence on the spectra of the covariance operators of the Gaussian measures. The inequalities cannot be improved in the general case.

Let X_1,…,X_n be an i.i.d. sample in R^p with zero mean and the covariance matrix \Sigma^{*}. The classical PCA approach recovers the projector \P_J^{*} onto the principal eigenspace of \Sigma^{*} by its empirical counterpart \hat \P_J. Recent paper [24] investigated the asymptotic distribution of the Frobenius distance between the projectors \|\hat \P_J - \P_J^{*}\|_2, while [27] offered a bootstrap procedure to measure uncertainty in recovering this subspace \P_J^{*} even in a finite sample setup. The present paper considers this problem from a Bayesian perspective and suggests to use the credible sets of the pseudo-posterior distribution on the space of covariance matrices induced by the conjugated Inverse Wishart prior as sharp confidence sets. This yields a numerically efficient procedure. Moreover, we theoretically justify this method and derive finite sample bounds on the corresponding coverage probability. Contrary to [24, 27], the obtained results are valid for non-Gaussian data: the main assumption that we impose is the concentration of the sample covariance \hat \Sigma in a vicinity of \Sigma^{*}. Numerical simulations illustrate good performance of the proposed procedure even on non-Gaussian data in a rather challenging regime.

In this paper we propose a novel dual regression-based approach for pricing American options. This approach reduces the complexity of the nested Monte Carlo method and has especially simple form for time discretized diffusion processes. We analyse the complexity of the proposed approach both in the case of fixed and increasing number of exercise dates. The method is illustrated by several numerical examples.

In this paper we suggest a modification of the regression-based variance reduction approach recently proposed in Belomestny et al. This modification is based on the stratification technique and allows for a further significant variance reduction. The performance of the proposed approach is illustrated by several numerical examples.

We consider a new method of semiparametric statistical estimation for the continuous-time moving-average Lévy processes. We derive the convergence rates of the proposed estimators and show that these rates are optimal in minimax sense.

This is an advanced guide to optimal stopping and control, focusing on advanced Monte Carlo simulation and its application to finance. Written for quantitative finance practitioners and researchers in academia, the book looks at the classical simulation based algorithms before introducing some of the new, cutting edge approaches under development.

In this paper we study the problem of statistical inference on the parameters of the semiparametric variance-mean mixtures. This class of mixtures has recently become rather popular in statistical and financial modelling. We design a semiparametric estimation procedure that first estimates the mean of the underlying normal distribution and then recovers nonparametrically the density of the corresponding mixing distribution. We illustrate the performance of our procedure on simulated and real data.

The estimation of the diffusion matrix Σ of a high-dimensional, possibly time-changed Levy process is studied, based on discrete observations of the process with a fixed distance. A low-rank condition is imposed on Σ. Applying a spectral approach, we construct a weighted least-squares estimator with nuclear-norm-penalisation. We prove oracle inequalities and derive convergence rates for the diffusion matrix estimator. The convergence rates show a surprising dependency on the rank of Σ and are optimal in the minimax sense for fixed dimensions. Theoretical results are illustrated by a simulation study.