Факультет компьютерных наук приглашает иностранных студентов пройти исследовательские стажировки сроком от двух до шести месяцев.
Process Simulation and Log Generation
Gena is a simple tool that allows to generate event logs by executing process models. It is useful in the testing and evaluation of process mining algorithms. Gena supports two important notations used in business process modelling and process mining: Petri nets and basic BPMN models. However, many improvements can be made in Gena to support more sophisticated simulation scenarios. Interns have an opportunity to join this project.
Process Mining Using Neural Networks
Process mining  is a modern research discipline that aims at improving (mostly business) processes on the base of so-called event logs. The event logs are created by information system supporting processes (process-aware information systems, PAIS) as a technical by-product of their life. With that along, event logs are a source of valuable information about the real lifeline of processes. Thus, event logs can be used to extract process models (the discovery task), to check the real behaviour of a process against its prescribed model (the conformance checking task), and, ultimately, to improve processes (the process enhancement task).
By now, there are a large number of algorithms, methods and techniques for extracting and analyzing process models. They are mostly based on pure algorithmic or mixed statistical approaches. However, only a very few works use neural networks-based approaches to process event data. For example, see .
The use of neural networks and deep machine learning for solving process mining problems seem to be very productive, especially from the point of view of working with large event logs that one has to deal with in practice. The project is dedicated to elaborate existing process mining algorithms and develop new ones, as well as methods and tools using neural networks.
 Wil van der Aalst. Process Mining: Data Science in Action. 2nd ed. Springer, 2016.
 T. Shunin, N. Zubkova, and S. Shershakov. Neural Approach to the Discovery Problem in Process Mining. In Proceedings: AIST 2018 Conference, LNCS 11179, pp. 261–273, Springer, 2018.
Universality Classes in Sandpiles
The project deals with self-organized critical systems and the prediction of rare events generated by them. Self-organized criticality (SOC) is prescribed to various observed systems and processes related to physics and economics. The first model with SOC was defined by Bak, Tang, and Wiesenfeld in 1987. They introduced a mechanism (the BTW mechanism) which balances two multi-scale processes: a constant slow loading and a quick stress-release. This mechanism results in power-law probability distributions that characterize the critical state but lacks parameters to fit the model regularities to observations.
The planned contribution of this project is twofold. First, we intend to establish that the BTW mechanism can be designed on a square lattice in such a way that the power-law exponent becomes adjustable. This contribution will extend the current picture of sand-pile models with two universality classes realized in various settings. The extension is based on the idea that several processes of stress propagation should be united into a single event if they occur next to one another in space and time. The second direction of this project pursues the prediction of extremes. We plan to address the prediction problem in sand-piles, exposing the predictability of events that are so large that they occur only in the supercritical state. In other words, extra system loading is required to make the system generate extremes. This natural property underlies predictability. The research has to reconcile general claims regarding unpredictability in sand-piles, which may describe typical events and some examples of efficient predictions related to large events.
Project 1. Coordinates and singularities of misorientation spaces
A misorientation is a measure of the displacement of a crystal lattice relative to another lattice. The space of all misorientations is a two-sided identification space G1\SO(3)/G2 where G1,G2 are point crystallography groups of lattices. It can be seen that misorientation space is a 3-dimensional orbifold. The underlying topological space of this orbifold depends on the crystallography groups. However, surprisingly often this space is homeomorphic to the 3-sphere according to Poincare conjecture. During the internship, you can do one of the two tasks (or both). (1) For particular choices of crystallography groups, construct the convenient coordinates on the corresponding misorientation spaces (in particular, you can try to prove the homeomorphism to 3-sphere, avoiding the reference to Poincare conjecture). (2) As a 3-orbifold each misorientation space has a 3-valent weighted graph as its orbifold locus or singularity. The task is to describe the orbifold singularities for all pairs of point crystallography groups. This problem can be approached either by visualizing things in Python or by careful examination of some known papers in invariant theory and low-dimensional topology.
Project 2. TDA and FCA in experiments with neurons
There is a famous experiment confirming the existence of the so-called place cells in mammals hippocampus. In this experiment, a mouse moves freely in a labyrinth, while the activity of some set of its neurons is recorded. The general goal is to understand neurons' activation patterns and their relation to the shape of the labyrinth, the trajectory of the rodent, and so on. Generally, one can reconstruct the topology of the labyrinth from neural activity, based on Nerve lemma. Instead of taking the nerves of the covers, one can try to find implications between different neurons, by constructing the lattice of formal concepts, relating the neural activity with the physical position of a mouse. Then one will be able to study the topology of this lattice instead of the nerve. During the internship, you will join our research team in this subject to work on the data obtained in real experiments. Your particular task will be to understand the theoretical basics of formal concept analysis and make a review of the packages which can be combined with the existing packages in TDA in order to proceed in the task described above. As a minimum, we plan that you can describe some basic implications between the activities of different groups of neurons.
Geometry of J
The first goal of this project is to further understand the geometric properties of this metric. For instance, one objective would be to study whether the set P(R^d), endowed with J, is a geodesic metric space, characterize its shortest paths and eventually its curvature properties. Detailed description.
A related field of investigation is to study barycenters relative to metric J. One natural question is whether such barycenters exist, are unique and if one can estimate them in a consistent way. These questions could be studied in the light of the recent paper by Le Gouic and Loubes (2017). Detailed description.
Given the recent work by Ahidar-Coutrix et al. (2018), another interesting question is whether one can establish so-called variance inequalities for the metric J. Such inequality was shown to guarantee fast convergence rates for certain estimators of barycenters in Ahidar-Coutrix et al. (2018) and would be of great interest from a statistical point of view. Detailed description.
Probably Approximately Correct Learning of Ontologies Based on Description Logics
In description logic knowledge bases, there may be a need to ensure that the terminological part of the base is complete in the sense that it captures all relevant relations between concepts. There is a method for Completing Description Logic Knowledge Bases using Formal Concept Analysis, but it takes exponential time in the worst case. During the internship, you will have to develop a more efficient probably approximately correct (PAC) version of this method. By asking implication questions to a domain expert, the method should be able to approximate the relationships between a given set of concept names that hold in the expert’s model and enrich the knowledge base with the newly discovered relationships. Alternatively (and somewhat more ambitiously), we may choose to study PAC learnability of ontologies based on specific description logics (see Exact Learning of Lightweight Description Logic Ontologies for results concerning exact rather than PAC learning of such ontologies).In description logic knowledge bases, there may be a need to ensure that the terminological part of the base is complete in the sense that it captures all relevant relations between concepts. There is a method for Completing Description Logic Knowledge Bases using Formal Concept Analysis, but it takes exponential time in the worst case. During the internship, you will have to develop a more efficient probably approximately correct (PAC) version of this method. By asking implication questions to a domain expert, the method should be able to approximate the relationships between a given set of concept names that hold in the expert’s model and enrich the knowledge base with the newly discovered relationships. Alternatively (and somewhat more ambitiously), we may choose to study PAC learnability of ontologies based on specific description logics (see Exact Learning of Lightweight Description Logic Ontologies for results concerning exact rather than PAC learning of such ontologies).
Collaborative Conceptual Exploration
Conceptual exploration is a family of knowledge-acquisition techniques based on formal concept analysis (Ganter and Obiedkov 2016). The goal is to build a complete (with respect to a fixed language) implicational theory of a domain by posing queries to the domain expert. When properly implemented, it is a great tool that can help organise the process of scientific discovery. The existing conceptual exploration methods assume a single oracle with a thorough knowledge of the domain. We aim to extend this model to a more practically relevant setting when, in place of a single omniscient and unerring oracle, we have multiple experts who have incomplete or even contradictory knowledge of the domain. Such extensions are known for other active learning models (Donmez and Carbonell 2008, Yan et al. 2011, Chakraborty 2020) but will be novel in conceptual exploration. Adapting exploration procedures to work with several imperfect experts requires modifying the mathematical model, developing new efficient algorithms, defining a strategy for experts’ interaction allowing for conflict resolution between contradictory experts’ opinions and techniques to combine the results of independent work of several expert groups, providing the ability to withdraw previous decisions based on new information (support for nonmonotonic reasoning). As for the joint work of experts, learning methods to be developed fall into two groups. If for a chosen description language, there exists a unique correct domain description (up to semantically equivalent syntactic transformations), there is a need for such methods that, under certain (e.g., probabilistic) assumptions on the completeness and correctness of experts’ knowledge, will return an approximate (in a certain formal sense) domain description. If there is no unique correct description and ontology construction is aimed at establishing consensus between various expert representations of the domain (which may be important, for example, for the humanities), there is a need for methods that can identify the knowledge shared by all experts and highlight positions on which experts disagree. During the internship, we will fix a precise problem statement of learning an implicational theory from queries to multiple experts and will try to adapt the existing learning algorithms to this setting.
Error Detection and Correction in Russian Texts Written by Non-native Speakers
General-purpose spellcheckers are usually designed so as to handle errors made by native speakers. Errors made by non-native speakers, e.g., language learners, are quite different, and they present a serious challenge for automated detection and correction. During the internship, you will have to study various approaches addressing this challenge, such as those described in the paper Grammar Error Correction in Morphologically-Rich Languages: The Case of Russian, and try to reproduce and improve results presented in the literature.
Часто задаваемые вопросы
Мы приглашаем студентов бакалаврских, магистерских и аспирантских программ. Важную роль играет опыт в интересующем вас направлении стажировки.
Нет, не нужно предоставлять такой сертификат. Но интервью с потенциальным научным руководителем будет проходить на английском языке.
Все стажировки являются бесплатными.
Да, мы покроем перелет и проживание.
Сесилия Тошири (Оксфорд)
В Москве я сотрудничала с двумя исследователями: Фёдором Ратниковым, профессором НИУ ВШЭ, который работает над экспериментом LHCb в ЦЕРН, и Андреем Устюжаниным, отвечающим за совместные проекты ЦЕРНа и Яндекса. В начале стажировки мы с Андреем Устюжаниным подробно обсудили мой проект: он расспросил меня про проблемы, возникающие в исследованиях, и посоветовал релевантные научные статьи. Это было очень полезно: я узнала про различные алгоритмы для поиска сходства между объектами. Но поездка в Москву запомнилась мне не только научной работой. Я была в восторге от местной еды — это как съездить на каникулы к гостеприимной бабушке. В целом поездка прошла удачно, и я даже немного выучила русский.
Бельаль Карими (Парижская политехническая школа)
Мне очень понравилась организация исследовательского процесса: студенты и руководители работали над проектами вместе. Каждый день в лаборатории было 6–10 исследователей, и мы неформально помогали друг другу, делились результатами экспериментов. Возможно, я приеду в Москву ещё: мой научный руководитель в Политехнической школе часто ездит в Россию. Мне понравилась эта поездка — особенно район Маяковской, в котором я жил, и выходные, проведённые в Санкт-Петербурге.
Лео Ботель (Высшая нормальная школа)
Мне долгое время хотелось поехать в Россию, и поэтому я стал читать про университеты с сильными программами по анализу данных в Москве. Среди них оказалась и ВШЭ. Сначала, когда я только приехал в НИУ ВШЭ, то собирался исследовать применения машинного обучения для построения социальных графов. Новую тему предложил Сергей [Кузнецов]: она оказалась довольно сложной и требовала сильной математической подготовки. Зато за два месяца, проведённые в Москве, я смог подтянуть свои навыки, которые пригодятся мне в последующей научной работе.
Диего Гранзиол (Оксфорд)
Я не могу не подчеркнуть, насколько мне повезло и какой честью для меня было приехать в Россию, работать со всей Группой байесовских методов. Атмосфера была теплой и гостеприимной. Тимур [Гарилов] – по-настоящему изумительный кодер, и без него я бы не сдвинулся с мертвой точки ни по одной из моих собственных идей. Я думаю, у него есть потенциал заниматься поистине совершенно исключительными исследованиями, и я невероятно рад следить за его успехами. Я благодарю Дмитрия [Ветрова], чей финальный вклад в документ, который мы отправили в NeurIPS, был совершенно необходим, за наши регулярные встречи, советы, вопросы, поддержку и время.