# Publications

Demographic and population structure inference is one of the most important problems in genomics. Population parameters such as effective population sizes, population split times and migration rates are of high interest both themselves and for many applications, e.g. for genome-wide association studies. Hidden Markov Model (HMM) based methods, such as PSMC, MSMC, coalHMM etc., proved to be powerful and useful for estimation of these parameters in many population genetics studies. At the same time, machine and deep learning have began to be used in natural science widely. In particular, deep learning based approaches have already substituted hidden Markov models in many areas, such as speech recognition or user input prediction. We develop a deep learning (DL) approach for local coalescent time estimation from one whole diploid genome. Our DL models are trained on simulated datasets. Importantly, demographic and population parameters can be inferred based on the distribution of coalescent times. We expect that our approach will be useful under complex population scenarios, which cannot be studied with existing HMM based methods. Our work is also a crucial step in developing a deep learning framework which would allow to create population genomics methods for different genomic data representations.

Modern experiments in high-energy physics require an increasing amount of simulated data. Monte-Carlo simulation of calorimeter responses is by far the most computationally expensive part of such simulations. Recent works have shown that the application of generative neural networks to this task can significantly speed up the simulations while maintaining an appropriate degree of accuracy. This paper explores different approaches to designing and training generative neural networks for simulation of the electromagnetic calorimeter response in the LHCb experiment.

In the present work, we introduce a machine learning-based approach for galaxy clustering. It requires to determine clusters to provide further galaxies groups' masses estimation. The knowledge of mass distribution is crucial in dark matter research and study of the large-scale structure of the Universe. State-of-the-art telescopes allow various spectroscopy range data accumulation that highlights the need for algorithms with a substantial generalization property. The data we deal with is a combination of more than twenty different catalogues. It is required to provide clustering of all combined galaxies. We produce a regression on the redshifts with the coefficient of determination *R*2 equals 0.99992 on the validation dataset with training dataset for 3,154,894 of galaxies (0.0016 < *z* < 7.0519).

Modern large-scale data-farms consist of hundreds of thousands of storage devices that span distributed infrastructure. Devices used in modern data centers (such as controllers, links, SSD- and HDD-disks) can fail due to hardware as well as software problems. Such failures or anomalies can be detected by monitoring the activity of components using machine learning techniques. In order to use these techniques, researchers need plenty of historical data of devices in normal and failure mode for training algorithms. In this work, we challenge two problems: 1) lack of storage data in the methods above by creating a simulator and 2) applying existing online algorithms that can faster detect a failure occurred in one of the components.

We created a Go-based (golang) package for simulating the behavior of modern storage infrastructure. The software is based on the discrete-event modeling paradigm and captures the structure and dynamics of high-level storage system building blocks. The package's exible structure allows us to create a model of a real-world storage system with a configurable number of components. The primary area of interest is exploring the storage machine's behavior under stress testing or exploitation in the medium-or long-term for observing failures of its components.

To discover failures in the time series distribution generated by the simulator, we modified a change point detection algorithm that works in online mode. The goal of the change-point detection is to discover differences in time series distribution. This work describes an approach for failure detection in time series data based on direct density ratio estimation via binary classifiers.

We propose a novel approach for a machine-learning-based detection of the type Ia supernovae using photometric information. Unlike other approaches, only real observation data is used during training. Despite being trained on a relatively small sample, the method shows good results on real data from the Open Supernovae Catalog. We also investigate model transfer from the PLAsTiCC simulations train dataset to real data application, and the reverse, and find the performance significantly decreases in both cases, highlighting the existing differences between simulated and real data.

The problem of community detection in a network with features at its nodes takes into account both the graph structure and node features. The goal is to find relatively dense groups of interconnected entities sharing some features in common. We apply the so-called data recovery approach to the problem by combining the least-squares recovery criteria for both, the graph structure and node features. In this way, we obtain a new clustering criterion and a corresponding algorithm for finding clusters one-by-one, so that the process can be interpreted as that of detecting communities indeed. We show that our proposed method is effective on real-world data, as well as on synthetic data involving either only quantitative features or only categorical attributes or both. In the cases at which attributes are categorical, state-of-the-art algorithms are available. Our algorithm appears competitive against them

We describe a fully GPU-based implementation of the first level trigger for the upgrade of the LHCb detector, due to start data taking in 2021. We demonstrate that our implementation, named Allen, can process the 40 Tbit/s data rate of the upgraded LHCb detector and perform a wide variety of pattern recognition tasks. These include finding the trajectories of charged particles, finding proton–proton collision points, identifying particles as hadrons or muons, and finding the displaced decay vertices of long-lived particles. We further demonstrate that Allen can be implemented in around 500 scientific or consumer GPU cards, that it is not I/O bound, and can be operated at the full LHC collision rate of 30 MHz. Allen is the first complete high-throughput GPU trigger proposed for a HEP experiment.

The problem of community detection in a network with features at its nodes takes into account both the graph structure and node features. The goal is to find relatively dense groups of interconnected entities sharing some features in common. Algorithms based on probabilistic community models require the node features to be categorical. We use a data-driven model by combining the least-squares data recovery criteria for both, the graph structure and node features. This allows us to take into account both quantitative and categorical features. After deriving an equivalent complementary criterion to optimize, we apply a greedy-wise algorithm for detecting communities in sequence. We experimentally show that our proposed method is effective on both real-world data and synthetic data. In the cases at which attributes are categorical, we compare our approach with state-of-the-art algorithms. Our algorithm appears competitive against them.

The results of an amplitude analysis of the charmless three-body decay B+→π+π+π-, in which CP-violation effects are taken into account, are reported. The analysis is based on a data sample corresponding to an integrated luminosity of 3 fb-1 of pp collisions recorded with the LHCb detector. The most challenging aspect of the analysis is the description of the behavior of the π+π- S-wave contribution, which is achieved by using three complementary approaches based on the isobar model, the K-matrix formalism, and a quasi-model-independent procedure. Additional resonant contributions for all three methods are described using a common isobar model, and include the ρ(770)0, ω(782) and ρ(1450)0 resonances in the π+π- P-wave, the f2(1270) resonance in the π+π- D-wave, and the ρ3(1690)0 resonance in the π+π- F-wave. Significant CP-violation effects are observed in both S- and D-waves, as well as in the interference between the S- and P-waves. The results from all three approaches agree and provide new insight into the dynamics and the origin of CP-violation effects in B+→π+π+π- decays.

Recently some specific classes of non-smooth and non-Lipsch-itz convex optimization problems were considered by Yu. Nesterov and H. Lu. We consider convex programming problems with similar smoothness conditions for the objective function and functional constraints. We introduce a new concept of an inexact model and propose some analogues of switching subgradient schemes for convex programming problems for the relatively Lipschitz-continuous objective function and functional constraints. Some class of online convex optimization problems is considered. The proposed methods are optimal in the class of optimization problems with relatively Lipschitz-continuous objective and functional constraints.

The problem of community detection in a network with features at its nodes takes into account both the graph structure and node features. The goal is to find relatively dense groups of interconnected entities sharing some features in common. We apply the so-called data recovery approach to the problem by combining the least-squares recovery criteria for both, the graph structure and node features. In this way, we obtain a new clustering criterion and a corresponding algorithm for finding clusters/communities one-by-one. We show that our proposed method is effective on real-world data, as well as on synthetic data involving either only quantitative features or only categorical attributes or both. Our algorithm appears competitive against state-of-the-art algorithms.

We propose a novel method for gradient-based optimization of black-box simulators using differentiable local surrogate models. In fields such as physics and engineering, many processes are modeled with non-differentiable simulators with intractable likelihoods. Optimization of these forward models is particularly challenging, especially when the simulator is stochastic. To address such cases, we introduce the use of deep generative models to iteratively approximate the simulator in local neighborhoods of the parameter space. We demonstrate that these local surrogates can be used to approximate the gradient of the simulator, and thus enable gradient-based optimization of simulator parameters. In cases where the dependence of the simulator on the parameter space is constrained to a low dimensional submanifold, we observe that our method attains minima faster than baseline methods, including Bayesian optimization, numerical optimization and approaches using score function gradient estimators.

We propose a novel method for gradient-based optimization of black-box simulators using differentiable local surrogate models. In fields such as physics and engineering, many processes are modeled with non-differentiable simulators with intractable likelihoods. Optimization of these forward models is particularly challenging, especially when the simulator is stochastic. To address such cases, we introduce the use of deep generative models to iteratively approximate the simulator in local neighborhoods of the parameter space. We demonstrate that these local surrogates can be used to approximate the gradient of the simulator, and thus enable gradient-based optimization of simulator parameters. In cases where the dependence of the simulator on the parameter space is constrained to a low dimensional submanifold, we observe that our method attains minima faster than baseline methods, including Bayesian optimization, numerical optimization and approaches using score function gradient estimators.

We propose a way to simulate Cherenkov detector response using a generative adversarial neural network to bypass low-level details. This network is trained to reproduce high level features of the simulated detector events based on input observables of incident particles. This allows the dramatic increase of simulation speed. We demonstrate that this approach provides simulation precision which is consistent with the baseline and discuss possible implications of these results.

The problem of community detection in a network with features at its nodes takes into account both the graph structure and node features. The goal is to find relatively dense groups of interconnected entities sharing some features in common. Existing approaches require the number of communities pre-specified. We apply the so-called data recovery approach to allow a relaxation of the criterion for finding communities one-by-one. We show that our proposed method is effective on real-world data, as well as on synthetic data involving either only quantitative features or only categorical attributes or both. In the cases at which attributes are categorical, state-of-the-art algorithms are available. Our algorithm appears competitive against them. © 2020 CEUR-WS. All rights reserved.

In this work, we propose an approach for electromagnetic shower generation on a track level. Currently, Monte Carlo simulation occupies 50-70\% of total computing resources that are used by physicists experiments worldwide. Thus, speedup of the simulation step allows to reduce simulation cost and accelerate synthetic experiments. In this paper, we suggest dividing the problem of shower generation into two separate issues: graph generation and tracks features generation. Both these problems can be efficiently solved with a cascade of deep autoregressive generative network and graph convolution network. The novelty of the proposed approach lies in the Neural networks application to the generation of the complex recursive physical process.

It has become a de-facto standard to represent words as elements of a vector space (word2vec, GloVe). While this approach is convenient, it is unnatural for language: words form a graph with a latent hierarchical structure, and this structure has to be revealed and encoded by word embeddings. We introduce GraphGlove: unsupervised graph word representations which are learned end-to-end. In our setting, each word is a node in a weighted graph and the distance between words is the shortest path distance between the corresponding nodes. We adopt a recent method learning a representation of data in the form of a differentiable weighted graph and use it to modify the GloVe training algorithm. We show that our graph-based representations substantially outperform vector-based methods on word similarity and analogy tasks. Our analysis reveals that the structure of the learned graphs is hierarchical and similar to that of WordNet, the geometry is highly non-trivial and contains subgraphs with different local topology.

The increasing luminosities of future Large Hadron Collider runs and next generation of collider experiments will require an unprecedented amount of simulated events to be produced. Such large scale productions are extremely demanding in terms of computing resources. Thus new approaches to event generation and simulation of detector responses are needed. In LHCb, the accurate simulation of Cherenkov detectors takes a sizeable fraction of CPU time. An alternative approach is described here, when one generates high-level reconstructed observables using a generative neural network to bypass low level details. This network is trained to reproduce the particle species likelihood function values based on the track kinematic parameters and detector occupancy. The fast simulation is trained using real data samples collected by LHCb during run 2. We demonstrate that this approach provides high-fidelity results.

The Ξc0 baryon is unstable and usually decays into charmless final states by the c→sud¯ transition. It can, however, also disintegrate into a π- meson and a Λc+ baryon via s quark decay or via cs→dc weak scattering. The interplay between the latter two processes governs the size of the branching fraction B(Ξc0→π-Λc+), first measured here to be (0.55±0.02±0.18)%, where the first uncertainty is statistical and second systematic. This result is compatible with the larger of the theoretical predictions that connect models of hyperon decays using partially conserved axial currents and SU(3) symmetry with those involving the heavy-quark expansion and heavy-quark symmetry. In addition, the branching fraction of the normalization channel, B(Ξc+→pK-π+)=(1.135±0.002±0.387)% is measured.

We report four narrow peaks in the Ξb0K- mass spectrum obtained using pp collisions at center-of-mass energies of 7, 8, and 13 TeV, corresponding to a total integrated luminosity of 9 fb-1 recorded by the LHCb experiment. Referring to these states by their mass, the mass values are m[Ωb(6316)-]=6315.64±0.31±0.07±0.50 MeV, m[Ωb(6330)-]=6330.30±0.28±0.07±0.50 MeV, m[Ωb(6340)-]=6339.71±0.26±0.05±0.50 MeV, m[Ωb(6350)-]=6349.88±0.35±0.05±0.50 MeV, where the uncertainties are statistical, systematic, and the last is due to the knowledge of the Ξb0 mass. The natural widths of the three lower mass states are consistent with zero, and the 90% confidence-level upper limits are determined to be Γ[Ωb(6316)-]<2.8 MeV, Γ[Ωb(6330)-]<3.1 MeV and Γ[Ωb(6340)-]<1.5 MeV. The natural width of the Ωb(6350)- peak is 1.4-0.8+1.0±0.1 MeV, which is 2.5σ from zero and corresponds to an upper limit of 2.8 MeV. The peaks have local significances ranging from 3.6σ to 7.2σ. After accounting for the look-elsewhere effect, the significances of the Ωb(6316)- and Ωb(6330)- peaks are reduced to 2.1σ and 2.6σ, respectively, while the two higher mass peaks exceed 5σ. The observed peaks are consistent with expectations for excited Ωb- resonances.

The first observation of the decay B0→D0¯D0K+π− is reported using proton-proton collision data corresponding to an integrated luminosity of 4.7 fb−1 collected by the LHCb experiment in 2011, 2012 and 2016. The measurement is performed in the full kinematically allowed range of the decay outside of the D*− region. The ratio of the branching fraction relative to that of the control channel B0→D*−D0K+ is measured to be R=(14.2±1.1±1.0)%, where the first uncertainty is statistical and the second is systematic. The absolute branching fraction of B0→D0¯D0K+π− decays is thus determined to be B(B0→D0¯D0K+π−)=(3.50±0.27±0.26±0.30)×10−4, where the third uncertainty is due to the branching fraction of the control channel. This decay mode is expected to provide insights to spectroscopy and the charm-loop contributions in rare semileptonic decays.