Centre for Language and Semantic Technologies

News

HSE Participates in Conference on International Exchange of Professionals in China

October 28

All news

Publications

Book

Kuznetsov S.

Data Analytics and Management in Data Intensive Domains: 25th International Conference, DAMDID/RCDL 2023, Moscow, Russia, October 24–27, 2023, Revised Selected Papers

This book constitutes the post-conference proceedings of the 25th International Conference on Data Analytics and Management in Data Intensive Domains, DAMDID/RCDL 2023, held in Moscow, Russia, during 24-27 October 2023.

The 21 papers presented here were carefully reviewed and selected from 75 submissions. These papers are organized in the following topical sections: Data Models and Knowledge Graphs; Databases in Data Intensive Domains; Machine learning methods and applications; Data Analysis in Astronomy & Information extraction from text. Papers from keynote talks have also been included in this book.

Vol. 2086: Communications in Computer and Information Science. Springer, 2024.
Article

Delitsyn A., Konyaev D., Vasiliy Kakurin et al.

Perturbation Theory with Accelerated Convergence for Fibre-Optic Nonlinearity Compensation

This paper presents an enhanced perturbation theory-based approach for compensating nonlinear distortion in long-haul fibre-optic communication systems. The proposed method combines perturbation-based compensator for fibre nonlinearity with machine learning, achieving high compensation accuracy with reduced computational complexity. We derive the theoretical framework for a modified perturbation method that leverages an effective lossless fibre model, uses a data-driven optimisation of the first-order perturbation term, and is naturally parallelisable. Numerical simulations for a dual-polarisation 16-QAM transmission link demonstrate that the learned first-order perturbation compensator can achieve performance comparable to SSFM, while maintaining lower complexity. We compare the proposed method with standard SSFM, both in the full link model and in an effective lossless model, as well as with conventional perturbation-based and purely linear compensation techniques. The results show that the machine learning-augmented perturbation approach provides superior accuracy over standard perturbation methods, often matching the benchmark SSFM on an effective model. The study also reveals that higher-order perturbation terms beyond the first order yield diminishing returns and can even degrade performance if not properly handled.

Communications in Nonlinear Science and Numerical Simulation. 2026. Vol. 153.
Book chapter

Kim J., Lee H., Jeon H. et al.

From Patterns to Predictions: A Shapelet-Based Framework for Directional Forecasting in Noisy Financial Markets

Directional forecasting in financial markets requires both accuracy and interpretability. Before the advent of deep learning, interpretable approaches based on human-defined patterns were prevalent, but their structural vagueness and scale ambiguity hindered generalization. In contrast, deep learning models can effectively capture complex dynamics, yet often offer limited transparency. To bridge this gap, we propose a two-stage framework that integrates unsupervised pattern extracion with interpretable forecasting. (i) SIMPC segments and clusters multivariate time series, extracting recurrent patterns that are invariant to amplitude scaling and temporal distortion, even under varying window sizes. (ii) JISC-Net is a shapelet-based classifier that uses the initial part of extracted patterns as input and forecasts subsequent partial sequences for short-term directional movement. Experiments on Bitcoin and three S&P 500 equities demonstrate that our method ranks first or second in 11 out of 12 metric--dataset combinations, consistently outperforming baselines. Unlike conventional deep learning models that output buy-or-sell signals without interpretable justification, our approach enables transparent decision-making by revealing the underlying pattern structures that drive predictive outcomes.

In bk.: CIKM '25: Proceedings of the 34th ACM International Conference on Information and Knowledge Management. ACM, 2025.
Working paper

Mirkin B., Parinov A., Halynchyk M. et al.

Versions of least-squares k-means algorithm for interval data

Recently, k-means clustering has been extended to the so-called interval data. In contrast to conventional data case, the interval data feature values are intervals rather than single reals. This paper further explores the least-squares criterion for k-means clustering to tackle the issue of initialization, that is, finding a proper set of initial cluster centers at interval data clustering. Specifically, we extend, for the interval data, a Pythagorean decomposition of the data scatter in the sum of two items, one being a genuine k-means least-squares criterion, the other, a complementary criterion, requiring the clusters to be numerous and anomalous. Therefore we propose a method for one-byone obtaining anomalous clusters. After a run of the method, we start k-means iterations from the centers of the most numerous of the found anomalous clusters. We test and validate our proposed BIKM algorithm at versions of two newly introduced interval datasets.

Математические методы анализа решений в экономике, бизнесе и политике. WP7. Издательский дом ВШЭ, 2024

All publications

Centre for Language and Semantic Technologies

International Laboratory of Intelligent Systems and Structural Analysis

Laboratory of Models and Methods of Computational Pragmatics

Laboratory of Complex Systems Modelling and Control

Semantics Analysis Laboratory (in Russian)

News

HSE Participates in Conference on International Exchange of Professionals in China

Publications

Data Analytics and Management in Data Intensive Domains: 25th International Conference, DAMDID/RCDL 2023, Moscow, Russia, October 24–27, 2023, Revised Selected Papers

Perturbation Theory with Accelerated Convergence for Fibre-Optic Nonlinearity Compensation

From Patterns to Predictions: A Shapelet-Based Framework for Directional Forecasting in Noisy Financial Markets

Versions of least-squares k-means algorithm for interval data

About

The centre's main objectives are:

Structure

International Laboratory of Intelligent Systems and Structural Analysis

Laboratory of Models and Methods of Computational Pragmatics

Laboratory of Complex Systems Modelling and Control

Semantics Analysis Laboratory (in Russian)

Management

News

HSE Participates in Conference on International Exchange of Professionals in China

Publications

Data Analytics and Management in Data Intensive Domains: 25th International Conference, DAMDID/RCDL 2023, Moscow, Russia, October 24–27, 2023, Revised Selected Papers

Perturbation Theory with Accelerated Convergence for Fibre-Optic Nonlinearity Compensation

From Patterns to Predictions: A Shapelet-Based Framework for Directional Forecasting in Noisy Financial Markets

Versions of least-squares k-means algorithm for interval data