Project Groups
Machine learning on graphs - leader Dmitry Ignatov
Development of methods and models for vectorization of network data in machine learning problems on graphs. Development of new methods for recommendation systems based on vector network models. Machine learning methods for structural information and recommender systems typically involve manual feature selection, sampling, or matrix factorization. Such methods are often tied to a specific task and are not scalable for working with big data. In recent years, vector graph representation models have become an active topic of study in the research community, with more than 400 models proposed in the last three years. Their main disadvantage is the lack of a universal design that supports working with a variety of graphs and different types of features for vertices and edges, while generalizing the model to dynamically changing data and while maintaining low computational complexity. We plan to build several types of models that combine different approaches based on neighborhood sampling, structural similarity, dual incidence graph embedding and graph convolutional neural networks in order to be able to process (un)directed (un)weighted graphs with possible feature information at the vertices and edges, as well as build vector models taking into account quality maximization for typical machine learning problems on graphs, such as finding communities, multi-class classification, predicting edges and predicting connections in a knowledge graph, as well as recommender systems. It is also planned to study the relationship of such methods as spectral graph clustering, singular decomposition of incidence matrices and vector representations for object similarity graphs.
Working group members:
2020-2021
1. Makarov Ilya, senior lecturer of the Department of Data Analysis and Artificial Intelligence, Faculty of Computer Science;
2. Zhukov Leonid, professor of the Department of Data Analysis and Artificial Intelligence, Faculty of Computer Science;
3. Kiselev Dmitry, 1st year graduate student at the Graduate School of Computer Science at the National Research University Higher School of Economics;
4. Muratova Anna, 2nd year graduate student at the Graduate School of Computer Science at the National Research University Higher School of Economics;
5. Nikolic Stefan, 1st year student of the master's program "Data Science" of the Faculty of Computer Science.
6. Senderovich Maria, research assistant at the laboratory of models and methods of computational pragmatics, Faculty of Computer Science.
2021-2022
1. Makarov Ilya, Associate Professor of the Department of Data Analysis and Artificial Intelligence, Faculty of Computer Science;
2. Dmitry Kiselev, 3rd year graduate student at the Graduate School of Computer Science, National Research University Higher School of Economics;
3. Muratova Anna, 4th year graduate student at the Graduate School of Computer Science at the National Research University Higher School of Economics;
4. Nikolic Stefan, 1st year graduate student at the Graduate School of Computer Science at the National Research University Higher School of Economics.
5. Gorshkov Sergey, 1st year graduate student at the Graduate School of Computer Science at the National Research University Higher School of Economics.
6. Yakovleva Alexandra, 1st year student of the master's program of the Faculty of Computer Science.
Publications
1. Ilya Makarov, Ksenia Korovina, Dmitrii Kiselev: JONNEE: Joint Network Nodes and Edges Embedding. IEEE Access 9: 144646-144659 (2021)
electronic edition via DOI (open access)
2. Tianxing M, Lushnov M, Ignatov DI, Shichkina YA, Zhukova NA, Vodyaho AI.2021. An ontology-based approach to the analysis of the acid-base state of patients at operative measures. Peer J Computer Science 7:e777
Cross-linguistic methods for identifying the meanings of polysemantic words - leader Nikolay Arefiev
Development of methods and tools for extracting meanings of ambiguous words applicable to various natural languages. The problem of polysemy of words and phrases is one of the basic properties of natural languages, which causes significant difficulties when creating applications for automatic text analysis. The goal of the project is to develop methods and tools for extracting the meanings of polysemantic words that rely on unlabeled text collections available in large volumes and do not require expensive manual linguistic markup. Unlike previous work on this topic, this project intends to initially use recently emerging cross-linguistic neural statistical language models (such as mBERT, XLM, XLM-R) to create methods and tools applicable to a large number of natural languages (applicability is expected to approximately 100 natural languages in which these models are trained; among them, Russian and English).
Working group members:
1. Arefyev Nikolay, junior researcher at the laboratory for models and methods of computational pragmatics, department of data analysis and artificial intelligence, Faculty of Computer Science, Higher School of Economics;
2. Rachinsky Maxim, master’s program “Data Science”, Faculty of Computer Science, Higher School of Economics, 1st year.
3. Panchenko Alexander, researcher at the Center for Computational and Engineering Sciences, Skolkovo Institute of Science and Technology;
Members of the project team, periodically involved:
4. Kazakov Roman, bachelor’s program “Fundamental and Computational Linguistics”, Faculty of Computer Science, Higher School of Economics, 4th year.
5. Chomsky Daniil, bachelor’s program “Applied Mathematics and Informatics”, Computational Mathematics and Computer Science of Moscow State University, 4th year.
Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!
To be used only for spelling or punctuation mistakes.