About

The Centre for Language and Semantic Technologies is part of the HSE Faculty of Computer Science. It was created to address natural language processing and the development of semantic technologies based on both interpretable artificial intelligence methods and modern machine learning models.

The centre's main objectives are:


1

developing and advancing interpretable machine learning and data mining methods for NLP and recommender systems


2

developing models that enhance the functionality of existing large language models by leveraging additional resources: linguistic models, knowledge models, search models, and planning algorithms

 


3

developing models and methods for automatic knowledge acquisition using large language models (LLM), including methods for transfer learning between different languages and different tasks

 


4

developing models and methods for research, modelling, and analysis within the framework of complex systems theory

 


5

developing semantic analysis tools based on mathematical methods in formal concept theory

 

Structure

International Laboratory of Intelligent Systems and Structural Analysis

We conduct research that enables the integration of structural and neural network representations in applied data analysis tasks

Laboratory of Models and Methods of Computational Pragmatics

We work on natural language processing (NLP), interpretable machine learning, and data mining, develop recommender systems and services, and advance multimodal clustering and classification methods that enable the creation of user interest profiles across multiple modalities

Laboratory of Complex Systems Modelling and Control

We conduct fundamental and applied scientific research in the mathematical modelling of complex systems, studying synchronisation phenomena, sudden regime changes, quasi-regularities, self-organisation, evaluating the effectiveness of rare event forecasting algorithms, and managing complex systems

Semantics Analysis Laboratory (in Russian)

Study of natural language as a whole within the natural science paradigm using methods of computer science and applied mathematics

Management

Sergei Kuznetsov

Director of the Centre, Doctor of Sciences, Professor

Marina Zhelyazkova

Deputy Director of the Centre, Candidate of Sciences

Publications

  • Book

    Edited by: A. Panchenko, D. Gubanov, M. Khachay et al.

    Analysis of Images, Social Networks and Texts. AIST 2024

    Iss. 2364. Springer Nature Switzerland, Cham, 2024.

  • Article

    Чертоганов К. А.

    Немарковская семантическая диффузия в многомерных пространствах: подход на основе уравнений Маккина–Власова–Фоккера–Планка

    Information diffusion models traditionally focus on event propagation, user activity and network interactions, providing limited mathematical insight into the evolution of semantic content itself. In this paper, narrative evolution is formulated as the evolution of probability measures in a multidimensional semantic space. We construct a mathematical bridge from event cascades to empirical semantic measures and subsequently to nonlinear density dynamics. The resulting framework is described by a non-Markovian McKean--Vlasov--Fokker--Planck equation incorporating semantic drift, collective interaction and memory effects. Within this setting, stable narrative configurations are characterized as stationary density solutions, while structural semantic transitions are associated with changes in their stability properties. The proposed model provides a unified probabilistic description of semantic diffusion and establishes a theoretical foundation for the analysis of long-term narrative evolution, semantic stabilization and regime transitions in complex information environments.

    Вестник ЮУрГУ. Серия: Вычислительная математика и информатика. Челябинск.. 2026.

  • Book chapter

    Chervyakov A., Isaeva U., Emelyanov A. et al.

    Multimodal Evaluation of Russian-language Architectures.

    Multimodal large language models (MLLMs) are currently at the center of research attention, showing rapid progress in scale and capabilities, yet their intelligence, limitations, and risks remain insufficiently understood. To address these issues, particularly in the context of the Russian language, where no multimodal benchmarks currently exist, we introduce MERA Multi, an open multimodal evaluation framework for Russian-spoken architectures. The benchmark is instruction-based and encompasses default text, image, audio, and video modalities, comprising 18 newly constructed evaluation tasks for both general-purpose models and modality-specific architectures (image-to-text, video-to-text, and audio-to-text). Our contributions include: (i) a universal taxonomy of multimodal abilities; (ii) 18 datasets created entirely from scratch with attention to Russian cultural and linguistic specificity, unified prompts, and metrics; (iii) baseline results for both closed-source and open-source models; (iv) a methodology for preventing benchmark leakage, including watermarking for private sets. While our current focus is on Russian, the proposed benchmark provides a replicable methodology for constructing multimodal benchmarks in typologically diverse languages, particularly within the Slavic language family.

    In bk.: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers). Vol. 1. Association for Computational Linguistics, 2026. P. 2114-2161.

  • Working paper

    Меньшиков И. А., Бернадотт А. К., Elvimov N. S.

    Hessian-based lightweight neural network for brain vessel segmentation on a minimal training dataset

    Accurate segmentation of blood vessels in brain magnetic resonance angiography (MRA) is essential for successful surgical procedures, such as aneurysm repair or bypass surgery. Currently, annotation is primarily performed through manual segmentation or classical methods, such as the Frangi filter, which often lack sufficient accuracy. Neural networks have emerged as powerful tools for medical image segmentation, but their development depends on well-annotated training datasets. However, there is a notable lack of publicly available MRA datasets with detailed brain vessel annotations. To address this gap, we propose a novel semi-supervised learning lightweight neural network with Hessian matrices on board for 3D segmentation of complex structures such as tubular structures, which we named HessNet. The solution is a Hessian-based neural network with only 6000 parameters. HessNet can run on the CPU and significantly reduces the resource requirements for training neural networks. The accuracy of vessel segmentation on a minimal training dataset reaches state-of-the-art results. It helps us create a large, semi-manually annotated brain vessel dataset of brain MRA images based on the IXI dataset (annotated 200 images). Annotation was performed by three experts under the supervision of three neurovascular surgeons after applying HessNet. It provides high accuracy of vessel segmentation and allows experts to focus only on the most complex important cases. The dataset is available at https://git.scinalytics.com/terilat/VesselDatasetPartly.

    Statistical mechanics. arXie. arXive, 2025

All publications