# Publications

The problem of community detection in a network with features at its nodes takes into account both the graph structure and node features. The goal is to find relatively dense groups of interconnected entities sharing some features in common. There have been several approaches proposed for that. We apply the so-called data recovery approach to the problem by combining the least-squares recovery criteria for both the graph structure and node features. In this way, we obtain a new clustering criterion and a corresponding algorithm for finding clusters one-by-one. We show that our proposed method is effective on real-world data, as well as on synthetic data involving either only quantitative features or only categorical attributes or both. In the cases at which attributes are categorical, state-of-the-art algorithms are available. Our algorithm appears competitive against them.

This paper proposes a meaningful and effective extension of the celebrated K-means algorithm to detect communities in feature-rich networks, due to our assumption of non-summability mode. We least-squares approximate given matrices of inter-node links and feature values, leading to a straightforward extension of the conventional K-means clustering method as an alternating minimization strategy for the criterion. This works in a two-fold space, embracing both the network nodes and features. The metric used is a weighted sum of the squared Euclidean distances in the feature and network spaces. To tackle the so-called curse of dimensionality, we extend this to a version that uses the cosine distances between entities and centers. One more version of our method is based on the Manhattan distance metric. We conduct computational experiments to test our method and compare its performances with those by competing popular algorithms at synthetic and real-world datasets. The cosine-based version of the extended K-means typically wins at the high-dimension real-world datasets. In contrast, the Manhattan-based version wins at most synthetic datasets. View Full-Text

A measurement of the lifetimes of the Ωc0 and Ξc0 baryons is reported using proton-proton collision data at a centre-of-mass energy of 13TeV, corresponding to an integrated luminosity of 5.4 fb−1 collected by the LHCb experiment. The Ωc0 and Ξc0 baryons are produced directly from proton interactions and reconstructed in the pK−K−π+ final state. The Ωc0 lifetime is measured to be 276.5 ± 13.4 ± 4.4 ± 0.7 fs, and the Ξc0 lifetime is measured to be 148.0 ± 2.3 ± 2.2 ± 0.2 fs, where the first uncertainty is statistical, the second systematic, and the third due to the uncertainty on the D0 lifetime. These results confirm previous LHCb measurements based on semileptonic beauty-hadron decays, which disagree with earlier results of a four times shorter Ωc0 lifetime, and provide the single most precise measurement of the Ωc0 lifetime.

The *W* boson mass is measured using proton-proton collision data at s√s = 13 TeV corresponding to an integrated luminosity of 1.7 fb*−*1 recorded during 2016 by the LHCb experiment. With a simultaneous fit of the muon *q/p*T distribution of a sample of *W* → *μν* decays and the *ϕ** distribution of a sample of *Z* → *μμ* decays the *W* boson mass is determined to be

(formula)

where uncertainties correspond to contributions from statistical, experimental systematic, theoretical and parton distribution function sources. This is an average of results based on three recent global parton distribution function sets. The measurement agrees well with the prediction of the global electroweak fit and with previous measurements.

The production cross-section of the χc1(3872) state relative to the ψ(2S) meson is measured using proton-proton collision data collected with the LHCb experiment at centre-of-mass energies of √s = 8 and 13TeV, corresponding to integrated luminosities of 2.0 and 5.4fb−1, respectively. The two mesons are reconstructed in the J/ψπ+π− final state. The ratios of the prompt and nonprompt χc1(3872) to ψ(2S) production cross-sections are measured as a function of transverse momentum, pT, and rapidity, y, of theχc1(3872) and ψ(2S) states, in the kinematic range 4 < pT < 20GeV/c and 2.0 < y < 4.5. The prompt ratio is found to increase with pT, independently of y. For the prompt component, the double ratio of the χc1(3872) and ψ(2S) production cross-sections between 13 and 8 TeV is observed to be consistent with unity, independent of pT and centre-of-mass energy.

Using proton-proton collision data, corresponding to an integrated luminosity of 9 fb*−*1 collected with the LHCb detector, seven decay modes of the B+cBc+ meson into a J*/*ψ or ψ(2S) meson and three charged hadrons, kaons or pions, are studied. The decays B+cBc+ → (ψ(2S) → J*/*ψπ+π*−*)π+, B+cBc+ → ψ(2S)π+π*−*π+, B+cBc+ → J*/*ψK+π*−*π+ and B+cBc+ → J*/*ψK+K*−*K+ are observed for the first time, and evidence for the B+cBc+ → ψ(2S)K+K*−*π+, decay is found, where J*/*ψ and ψ(2S) mesons are reconstructed in their dimuon decay modes. The ratios of branching fractions between the different B+cBc+ decays are reported as well as the fractions of the decays proceeding via intermediate resonances. The results largely support the factorisation approach used for a theoretical description of the studied decays.

The aim of the study was to assess the capabilities of age determination (age group) at death using classification techniques by histomorphometric characteristics of osseous and cartilaginous tissue aging.

Materials and Methods. The study material was a database containing the findings of morphometric researches of osseous and cartilaginous tissue histologic specimens from 294 categorized male corpses aged 10–93 years. For data analysis and classification we used modern machine learning methods: k-NN, SVM, logistic regression, CatBoost, SGD, naive Bayes, random forest, nonlinear dimensionality reduction methods (t-SNE and uMAP), and recursive feature elimination for feature selection.

Results. The used techniques (algorithms) provided effective representation of a complex data set (76 histomorphometricfeatures), allowing to reveal the cluster structure inside the low dimensional feature space, thus fitting the classifier becomes even more reasonable. During feature selection, we estimated their importance for age group classification and studied the relationship between classification quality and the number of features inside the feature space. Data pre-processing made it possible to get rid of noise and keep most informative features, thereby accelerating a learning process and improving the classification quality. Data projection showed more well-defined cluster structure in the space of selected features. The accuracy of establishing certain groups was equal to 90%. It proves high efficiency of machine learning techniques used for forensic age diagnostics based on histomorphometric findings.

The problem of community detection in a network with features at its nodes takes into account both the graph structure and node features. The goal is to find relatively dense groups of interconnected entities sharing some features in common. Algorithms based on probabilistic community models require the node features to be categorical. We use a data-driven model by combining the least-squares data recovery criteria for both, the graph structure and node features. This allows us to take into account both quantitative and categorical features. After deriving an equivalent complementary criterion to optimize, we apply a greedy-wise algorithm for detecting communities in sequence. We experimentally show that our proposed method is effective on both real-world data and synthetic data. In the cases at which attributes are categorical, we compare our approach with state-of-the-art algorithms. Our algorithm appears competitive against them.

We propose an extension of the celebrated K-means algorithm for community detection in feature-rich networks. Our least-squares criterion leads to a straightforward extension of the conventional batch K-means clustering method as an alternating optimization strategy for the criterion. By replacing the innate squared Euclidean distance with cosine distance we effectively tackle the so-called curse of dimensionality. We compare our proposed methods using synthetic and real-world data with state-of-the-art algorithms from the literature. The cosine distance-based version appears to be the overall winner, especially at larger datasets.

The first full angular analysis of the 𝐵0→𝐷∗−𝐷∗+𝑠B0→D∗−Ds∗+ decay is performed using 6 fb*−*1 of *pp* collision data collected with the LHCb experiment at a centre-of-mass energy of 13 TeV. The 𝐷∗+𝑠→𝐷+𝑠𝛾Ds∗+→Ds+γ and *D***−* → 𝐷⎯⎯⎯⎯⎯0𝜋−D¯0π− vector meson decays are used with the subsequent 𝐷+𝑠Ds+ → *K*+*K**−**π*+ and 𝐷⎯⎯⎯⎯⎯0D¯0 → *K*+*π**−* decays. All helicity amplitudes and phases are measured, and the longitudinal polarisation fraction is determined to be *f*L = 0*.*578 ± 0*.*010 ± 0*.*011 with world-best precision, where the first uncertainty is statistical and the second is systematic. The pattern of helicity amplitude magnitudes is found to align with expectations from quark-helicity conservation in *B* decays. The ratio of branching fractions [ℬ(𝐵0→𝐷∗−𝐷∗+𝑠B0→D∗−Ds∗+) × ℬ(𝐷∗+𝑠→𝐷+𝑠𝛾Ds∗+→Ds+γ)]*/*ℬ(*B*0 → *D***−*𝐷+𝑠Ds+) is measured to be 2*.*045 ± 0*.*022 ± 0*.*071 with world-best precision. In addition, the first observation of the Cabibbo-suppressed *B**s* → *D***−*𝐷+𝑠Ds+ decay is made with a significance of seven standard deviations. The branching fraction ratio ℬ(*B**s* → *D***−*𝐷+𝑠Ds+)*/*ℬ(*B*0 → *D***−*𝐷+𝑠Ds+) is measured to be 0*.*049 ± 0*.*006 ± 0*.*003 ± 0*.*002, where the third uncertainty is due to limited knowledge of the ratio of fragmentation fractions.

We present an angular analysis of the Bþ → Kþð→ K0SπþÞμþμ− decay using 9 fb−1 of pp collision data collected with the LHCb experiment. For the first time, the full set of CP-averaged angular observables is measured in intervals of the dimuon invariant mass squared. Local deviations from standard model predictions are observed, similar to those in previous LHCb analyses of the isospin-partner B0 → K0μþμ− decay. The global tension is dependent on which effective couplings are considered and on the choice of theory nuisance parameters.

The common approach for constructing a classifier for particle selection assumes reasonable consistency between train data samples and the target data sample used for the particular analysis. However, train and target data may have very different properties, like energy spectra for signal and background contributions. We propose a new method based on an ensemble of pre-trained classifiers, each trained of an exclusive subset, a data basket, of the total dataset. Appropriate separate adjustment of separation thresholds for every basket classifier allows to dynamically adjust the combined classifier and make optimal prediction for data with differing properties without re-training of the classifier. The approach is illustrated with a toy example. A quality dependency on the number of used data baskets is also presented

Demographic and population structure inference is one of the most important problems in genomics. Population parameters such as effective population sizes, population split times and migration rates are of high interest both themselves and for many applications, e.g. for genome-wide association studies. Hidden Markov Model (HMM) based methods, such as PSMC, MSMC, coalHMM etc., proved to be powerful and useful for estimation of these parameters in many population genetics studies. At the same time, machine and deep learning have began to be used in natural science widely. In particular, deep learning based approaches have already substituted hidden Markov models in many areas, such as speech recognition or user input prediction. We develop a deep learning (DL) approach for local coalescent time estimation from one whole diploid genome. Our DL models are trained on simulated datasets. Importantly, demographic and population parameters can be inferred based on the distribution of coalescent times. We expect that our approach will be useful under complex population scenarios, which cannot be studied with existing HMM based methods. Our work is also a crucial step in developing a deep learning framework which would allow to create population genomics methods for different genomic data representations.

The main result of this paper is an extension of the K-means algorithm to the issue of community detection in feature-rich networks. This is based on a data-recovery criterion additively combining conventional least-squares criteria for approximation of the network link data and the feature data at network nodes. The dimension of the space at which the method operates is the sum of the number of nodes and the number of features, which may be high indeed. To tackle the so-called curse of dimensionality, we replace the innate Euclidean distance with cosine distance. We experimentally validate our proposed methods and demonstrate their efficiency by comparing them to most popular approaches using both synthetic data and real-world data.

First evidence of a structure in the J/ψΛ invariant mass distribution is obtained from an amplitude analysis of Ξb-→J/ψΛK- decays. The observed structure is consistent with being due to a charmonium pentaquark with strangeness with a significance of 3.1σ including systematic uncertainties and look-elsewhere effect. Its mass and width are determined to be 4458.8±2.9-1.1+4.7MeV and 17.3±6.5-5.7+8.0MeV, respectively, where the quoted uncertainties are statistical and systematic. The structure is also consistent with being due to two resonances. In addition, the narrow excited Ξ- states, Ξ1690- and Ξ1820-, are seen for the first time in a Ξb- decay, and their masses and widths are measured with improved precision. The analysis is performed using pp collision data corresponding to a total integrated luminosity of 9 fb-1, collected with the LHCb experiment at centre-of-mass energies of 7, 8 and 13 TeV.

We explore a hidden feedback loops effect in online recommender systems. Feedback loops result in degradation of online multi-armed bandit (MAB) recommendations to a small subset and loss of coverage and novelty. We study how uncertainty and noise in user interests influence the existence of feedback loops. First, we show that an unbiased additive random noise in user interests does not prevent a feedback loop. Second, we demonstrate that a non-zero probability of resetting user interests is sufficient to limit the feedback loop and estimate the size of the effect. Our experiments confirm the theoretical findings in a simulated environment for four bandit algorithms.

Modern experiments in high-energy physics require an increasing amount of simulated data. Monte-Carlo simulation of calorimeter responses is by far the most computationally expensive part of such simulations. Recent works have shown that the application of generative neural networks to this task can significantly speed up the simulations while maintaining an appropriate degree of accuracy. This paper explores different approaches to designing and training generative neural networks for simulation of the electromagnetic calorimeter response in the LHCb experiment.

A flavour-tagged time-dependent angular analysis of {{B} ^0_{s}} \!\rightarrow {{J /\psi }} \phi decays is presented where the {J /\psi } meson is reconstructed through its decay to an e ^+e ^- pair. The analysis uses a sample of pp collision data recorded with the LHCb experiment at centre-of-mass energies of 7 and 8\text {\,Te V} , corresponding to an integrated luminosity of 3 \text {\,fb} ^{-1} . The C\!P-violating phase and lifetime parameters of the {B} ^0_{s} system are measured to be {\phi _{{s}}} =0.00\pm 0.28\pm 0.07\text {\,rad}, {\Delta \Gamma _{{s}}} =0.115\pm 0.045\pm 0.011\text {\,ps} ^{-1} and {\Gamma _{{s}}} =0.608\pm 0.018\pm 0.012\text {\,ps} ^{-1} where the first uncertainty is statistical and the second systematic. This is the first time that C\!P-violating parameters are measured in the {{B} ^0_{s}} \!\rightarrow {{J /\psi }} \phi decay with an e ^+e ^- pair in the final state. The results are consistent with previous measurements in other channels and with the Standard Model predictions.

The first observation of the suppressed semileptonic B0s→K−μ+νμ decay is reported. Using a data sample recorded in pp collisions in 2012 with the LHCb detector, corresponding to an integrated luminosity of 2 fb−1, the branching fraction B(B0s→K−μ+νμ) is measured to be [1.06±0.05(stat)±0.08(syst)]×10−4, where the first uncertainty is statistical and the second one represents the combined systematic uncertainties. The decay B0s→D−sμ+νμ, where D−s is reconstructed in the final state K+K−π−, is used as a normalization channel to minimize the experimental systematic uncertainty. Theoretical calculations on the form factors of the B0s→K− and B0s→D−s transitions are employed to determine the ratio of the Cabibbo-Kobayashi-Maskawa matrix elements |Vub|/|Vcb| at low and high B0s→K− momentum transfer.