About

The Centre for Language and Semantic Technologies is part of the HSE Faculty of Computer Science. It was created to address natural language processing and the development of semantic technologies based on both interpretable artificial intelligence methods and modern machine learning models.

The centre's main objectives are:


1

developing and advancing interpretable machine learning and data mining methods for NLP and recommender systems


2

developing models that enhance the functionality of existing large language models by leveraging additional resources: linguistic models, knowledge models, search models, and planning algorithms

 


3

developing models and methods for automatic knowledge acquisition using large language models (LLM), including methods for transfer learning between different languages and different tasks

 


4

developing models and methods for research, modelling, and analysis within the framework of complex systems theory

 


5

developing semantic analysis tools based on mathematical methods in formal concept theory

 

Structure

International Laboratory of Intelligent Systems and Structural Analysis

We conduct research that enables the integration of structural and neural network representations in applied data analysis tasks

Laboratory of Models and Methods of Computational Pragmatics

We work on natural language processing (NLP), interpretable machine learning, and data mining, develop recommender systems and services, and advance multimodal clustering and classification methods that enable the creation of user interest profiles across multiple modalities

Laboratory of Complex Systems Modelling and Control

We conduct fundamental and applied scientific research in the mathematical modelling of complex systems, studying synchronisation phenomena, sudden regime changes, quasi-regularities, self-organisation, evaluating the effectiveness of rare event forecasting algorithms, and managing complex systems

Semantics Analysis Laboratory (in Russian)

Study of natural language as a whole within the natural science paradigm using methods of computer science and applied mathematics

Management

Sergei Kuznetsov

Director of the Centre, Doctor of Sciences, Professor

Marina Zhelyazkova

Deputy Director of the Centre, Candidate of Sciences

Publications

  • Data Analytics and Management in Data Intensive Domains: 25th International Conference, DAMDID/RCDL 2023, Moscow, Russia, October 24–27, 2023, Revised Selected Papers

    This book constitutes the post-conference proceedings of the 25th International Conference on Data Analytics and Management in Data Intensive Domains, DAMDID/RCDL 2023, held in Moscow, Russia, during 24-27 October 2023.


    The 21 papers presented here were carefully reviewed and selected from 75 submissions. These papers are organized in the following topical sections: Data Models and Knowledge Graphs; Databases in Data Intensive Domains; Machine learning methods and applications; Data Analysis in Astronomy & Information extraction from text. Papers from keynote talks have also been included in this book.

     


    Vol. 2086: Communications in Computer and Information Science. Springer, 2024.

  • Modeling Pruning as a Phase Transition: A Thermodynamic Analysis of Neural Activations

    Activation pruning reduces neural network complexity by eliminating low-importance neuron activations, yet identifying the critical pruning threshold—beyond which accuracy rapidly deteriorates—remains computationally expensive and typically requires exhaustive search. We introduce a thermodynamics-inspired framework that treats activation distributions as energy-filtered physical systems and employs the free energy of activations as a principled evaluation metric. Phase-transition–like phenomena in the free-energy profile—such as extrema, inflection points, and curvature changes—yield reliable estimates of the critical pruning threshold, providing a theoretically grounded means of predicting sharp accuracy degradation. To further enhance efficiency, we propose a renormalized free energy technique that approximates full-evaluation free energy using only the activation distribution of the unpruned network. This eliminates repeated forward passes, dramatically reducing computational overhead and achieving speedups of up to 550× for MLPs. Extensive experiments across diverse vision architectures (MLP, CNN, ResNet, MobileNet, Vision Transformer) and text models (LSTM, BERT, ELECTRA, T5, GPT-2) on multiple datasets validate the generality, robustness, and computational efficiency of our approach. Overall, this work establishes a theoretically grounded and practically effective framework for activation pruning, bridging the gap between analytical understanding and efficient deployment of sparse neural networks.

    Computers, Materials and Continua. 2025. P. 1-24.

  • Book chapter

    Dudyrev E., Couceiro M., Kaytoue M. et al.

    Atomic Patterns for Efficient Computation with Pattern Structures

    Pattern Structures is a framework in FCA allowing objects to have complex descriptions, only requiring that the set of descriptions forms a complete meet-semi-lattice. However, some particular descrip tions or patterns, such as subgraphs and subsequences, do not necessarily ensure that every pair of descriptions has a unique infimum and ask for additional operations, e.g., anti-chain completion. Moreover, meet-based approaches struggle to generate non-trivial implications for complex data since, in general, they only output closed descriptions. For overcoming such limitations, we introduce in this paper an alternative view of pat tern structures based on the join operation and the so-called “atomic patterns”. Such atomic patterns correspond to join-irreducible descrip tions in the join-semi-lattice of all possible descriptions. They enable an efficient traversal of the description space and the computation of closures, minimal generators, pseudo-intents, implications among others, while showing very good computational performance.

    In bk.: Second International Joint Conference, CONCEPTS 2025, Cluj-Napoca, Romania, September 8–12, 2025, Proceedings. Conceptual Knowledge Structures. LNCS, volume 15941. Cham: Springer, 2025. P. 178-194.

  • Working paper

    Меньшиков И. А., Бернадотт А. К., Elvimov N. S.

    Hessian-based lightweight neural network for brain vessel segmentation on a minimal training dataset

    Accurate segmentation of blood vessels in brain magnetic resonance angiography (MRA) is essential for successful surgical procedures, such as aneurysm repair or bypass surgery. Currently, annotation is primarily performed through manual segmentation or classical methods, such as the Frangi filter, which often lack sufficient accuracy. Neural networks have emerged as powerful tools for medical image segmentation, but their development depends on well-annotated training datasets. However, there is a notable lack of publicly available MRA datasets with detailed brain vessel annotations. To address this gap, we propose a novel semi-supervised learning lightweight neural network with Hessian matrices on board for 3D segmentation of complex structures such as tubular structures, which we named HessNet. The solution is a Hessian-based neural network with only 6000 parameters. HessNet can run on the CPU and significantly reduces the resource requirements for training neural networks. The accuracy of vessel segmentation on a minimal training dataset reaches state-of-the-art results. It helps us create a large, semi-manually annotated brain vessel dataset of brain MRA images based on the IXI dataset (annotated 200 images). Annotation was performed by three experts under the supervision of three neurovascular surgeons after applying HessNet. It provides high accuracy of vessel segmentation and allows experts to focus only on the most complex important cases. The dataset is available at https://git.scinalytics.com/terilat/VesselDatasetPartly.

    Statistical mechanics. arXie. arXive, 2025

All publications