The centre's main objectives are:
1
developing and advancing interpretable machine learning and data mining methods for NLP and recommender systems
2
developing models that enhance the functionality of existing large language models by leveraging additional resources: linguistic models, knowledge models, search models, and planning algorithms
3
developing models and methods for automatic knowledge acquisition using large language models (LLM), including methods for transfer learning between different languages and different tasks
4
developing models and methods for research, modelling, and analysis within the framework of complex systems theory
5
developing semantic analysis tools based on mathematical methods in formal concept theory
Structure
International Laboratory of Intelligent Systems and Structural Analysis
We conduct research that enables the integration of structural and neural network representations in applied data analysis tasks
Laboratory of Models and Methods of Computational Pragmatics
We work on natural language processing (NLP), interpretable machine learning, and data mining, develop recommender systems and services, and advance multimodal clustering and classification methods that enable the creation of user interest profiles across multiple modalities
Laboratory of Complex Systems Modelling and Control
We conduct fundamental and applied scientific research in the mathematical modelling of complex systems, studying synchronisation phenomena, sudden regime changes, quasi-regularities, self-organisation, evaluating the effectiveness of rare event forecasting algorithms, and managing complex systems
Semantics Analysis Laboratory (in Russian)
Study of natural language as a whole within the natural science paradigm using methods of computer science and applied mathematics
Management
Director of the Centre, Doctor of Sciences, Professor
Deputy Director of the Centre, Candidate of Sciences
Publications
-
Book
Data Analytics and Management in Data Intensive Domains: 25th International Conference, DAMDID/RCDL 2023, Moscow, Russia, October 24–27, 2023, Revised Selected Papers
This book constitutes the post-conference proceedings of the 25th International Conference on Data Analytics and Management in Data Intensive Domains, DAMDID/RCDL 2023, held in Moscow, Russia, during 24-27 October 2023.
The 21 papers presented here were carefully reviewed and selected from 75 submissions. These papers are organized in the following topical sections: Data Models and Knowledge Graphs; Databases in Data Intensive Domains; Machine learning methods and applications; Data Analysis in Astronomy & Information extraction from text. Papers from keynote talks have also been included in this book.Vol. 2086: Communications in Computer and Information Science. Springer, 2024.
-
Article
Perturbation Theory with Accelerated Convergence for Fibre-Optic Nonlinearity Compensation
This paper presents an enhanced perturbation theory-based approach for compensating nonlinear distortion in long-haul fibre-optic communication systems. The proposed method combines perturbation-based compensator for fibre nonlinearity with machine learning, achieving high compensation accuracy with reduced computational complexity. We derive the theoretical framework for a modified perturbation method that leverages an effective lossless fibre model, uses a data-driven optimisation of the first-order perturbation term, and is naturally parallelisable. Numerical simulations for a dual-polarisation 16-QAM transmission link demonstrate that the learned first-order perturbation compensator can achieve performance comparable to SSFM, while maintaining lower complexity. We compare the proposed method with standard SSFM, both in the full link model and in an effective lossless model, as well as with conventional perturbation-based and purely linear compensation techniques. The results show that the machine learning-augmented perturbation approach provides superior accuracy over standard perturbation methods, often matching the benchmark SSFM on an effective model. The study also reveals that higher-order perturbation terms beyond the first order yield diminishing returns and can even degrade performance if not properly handled.
Communications in Nonlinear Science and Numerical Simulation. 2026. Vol. 153.
-
Book chapter
From Patterns to Predictions: A Shapelet-Based Framework for Directional Forecasting in Noisy Financial Markets
Directional forecasting in financial markets requires both accuracy and interpretability. Before the advent of deep learning, interpretable approaches based on human-defined patterns were prevalent, but their structural vagueness and scale ambiguity hindered generalization. In contrast, deep learning models can effectively capture complex dynamics, yet often offer limited transparency. To bridge this gap, we propose a two-stage framework that integrates unsupervised pattern extracion with interpretable forecasting. (i) SIMPC segments and clusters multivariate time series, extracting recurrent patterns that are invariant to amplitude scaling and temporal distortion, even under varying window sizes. (ii) JISC-Net is a shapelet-based classifier that uses the initial part of extracted patterns as input and forecasts subsequent partial sequences for short-term directional movement. Experiments on Bitcoin and three S&P 500 equities demonstrate that our method ranks first or second in 11 out of 12 metric--dataset combinations, consistently outperforming baselines. Unlike conventional deep learning models that output buy-or-sell signals without interpretable justification, our approach enables transparent decision-making by revealing the underlying pattern structures that drive predictive outcomes.
In bk.: CIKM '25: Proceedings of the 34th ACM International Conference on Information and Knowledge Management. ACM, 2025.
-
Working paper
Versions of least-squares k-means algorithm for interval data
Recently, k-means clustering has been extended to the so-called interval data. In contrast to conventional data case, the interval data feature values are intervals rather than single reals. This paper further explores the least-squares criterion for k-means clustering to tackle the issue of initialization, that is, finding a proper set of initial cluster centers at interval data clustering. Specifically, we extend, for the interval data, a Pythagorean decomposition of the data scatter in the sum of two items, one being a genuine k-means least-squares criterion, the other, a complementary criterion, requiring the clusters to be numerous and anomalous. Therefore we propose a method for one-byone obtaining anomalous clusters. After a run of the method, we start k-means iterations from the centers of the most numerous of the found anomalous clusters. We test and validate our proposed BIKM algorithm at versions of two newly introduced interval datasets.Математические методы анализа решений в экономике, бизнесе и политике. WP7. Издательский дом ВШЭ, 2024