The Faculty of Computer Science invites international students to participate in research internships. The duration of the internship is two to six months.
Process Simulation and Log Generation
Gena is a simple tool that allows to generate event logs by executing process models. It is useful in the testing and evaluation of process mining algorithms. Gena supports two important notations used in business process modelling and process mining: Petri nets and basic BPMN models. However, many improvements can be made in Gena to support more sophisticated simulation scenarios. Interns have an opportunity to join this project.
Process Mining Using Neural Networks
Process mining  is a modern research discipline that aims at improving (mostly business) processes on the base of so-called event logs. The event logs are created by information system supporting processes (process-aware information systems, PAIS) as a technical by-product of their life. With that along, event logs are a source of valuable information about the real lifeline of processes. Thus, event logs can be used to extract process models (the discovery task), to check the real behaviour of a process against its prescribed model (the conformance checking task), and, ultimately, to improve processes (the process enhancement task).
By now, there are a large number of algorithms, methods and techniques for extracting and analyzing process models. They are mostly based on pure algorithmic or mixed statistical approaches. However, only a very few works use neural networks-based approaches to process event data. For example, see .
The use of neural networks and deep machine learning for solving process mining problems seem to be very productive, especially from the point of view of working with large event logs that one has to deal with in practice. The project is dedicated to elaborate existing process mining algorithms and develop new ones, as well as methods and tools using neural networks.
 Wil van der Aalst. Process Mining: Data Science in Action. 2nd ed. Springer, 2016.
 T. Shunin, N. Zubkova, and S. Shershakov. Neural Approach to the Discovery Problem in Process Mining. In Proceedings: AIST 2018 Conference, LNCS 11179, pp. 261–273, Springer, 2018.
Universality Classes in Sandpiles
The project deals with self-organized critical systems and the prediction of rare events generated by them. Self-organized criticality (SOC) is prescribed to various observed systems and processes related to physics and economics. The first model with SOC was defined by Bak, Tang, and Wiesenfeld in 1987. They introduced a mechanism (the BTW mechanism) which balances two multi-scale processes: a constant slow loading and a quick stress-release. This mechanism results in power-law probability distributions that characterize the critical state but lacks parameters to fit the model regularities to observations.
The planned contribution of this project is twofold. First, we intend to establish that the BTW mechanism can be designed on a square lattice in such a way that the power-law exponent becomes adjustable. This contribution will extend the current picture of sand-pile models with two universality classes realized in various settings. The extension is based on the idea that several processes of stress propagation should be united into a single event if they occur next to one another in space and time. The second direction of this project pursues the prediction of extremes. We plan to address the prediction problem in sand-piles, exposing the predictability of events that are so large that they occur only in the supercritical state. In other words, extra system loading is required to make the system generate extremes. This natural property underlies predictability. The research has to reconcile general claims regarding unpredictability in sand-piles, which may describe typical events and some examples of efficient predictions related to large events.
Project 1. Coordinates and singularities of misorientation spaces
A misorientation is a measure of displacement of a crystal lattice relative to another lattice. The space of all misorientations is a two-sided identification space G1\SO(3)/G2 where G1,G2 are point crystallography groups of lattices. It can be seen that misorientation space is a 3-dimensional orbifold. The underlying topological space of this orbifold depends on the crystallography groups. However, surprisingly often this space is homeomorphic to the 3-sphere according to Poincare conjecture. During the internship, you can do one of the two tasks (or both). (1) For particular choices of crystallography groups, construct the convenient coordinates on the corresponding misorientation spaces (in particular, you can try to prove the homeomorphism to 3-sphere, avoiding the reference to Poincare conjecture). (2) As a 3-orbifold each misorientation space has a 3-valent weighted graph as its orbifold locus or singularity. The task is to describe the orbifold singularities for all pairs of point crystallography groups. This problem can be approached either by visualizing things in Python or by careful examination of some known papers in invariant theory and low-dimensional topology.
Project 2. TDA and FCA in experiments with neurons
There is a famous experiment confirming the existence of the so-called place cells in mammals hippocampus. In this experiment, a mouse moves freely in a labyrinth, while the activity of some set of its neurons is recorded. The general goal is to understand neurons' activation patterns and their relation to the shape of the labyrinth, the trajectory of the rodent, and so on. Generally, one can reconstruct the topology of the labyrinth from neural activity, based on Nerve lemma. Instead of taking the nerves of the covers, one can try to find implications between different neurons, by constructing the lattice of formal concepts, relating the neural activity with the physical position of a mouse. Then one will be able to study the topology of this lattice instead of the nerve. During the internship, you will join our research team in this subject to work on the data obtained in real experiments. Your particular task will be to understand the theoretical basics of formal concept analysis and make a review of the packages which can be combined with the existing packages in TDA in order to proceed in the task described above. As a minimum, we plan that you can describe some basic implications between the activities of different groups of neurons.
Geometry of J
The first goal of this project is to further understand the geometric properties of this metric. For instance, one objective would be to study whether the set P(R^d), endowed with J, is a geodesic metric space, characterize its shortest paths and eventually its curvature properties. Detailed description.
A related field of investigation is to study barycenters relative to metric J. One natural question is whether such barycenters exist, are unique and if one can estimate them in a consistent way. These questions could be studied in the light of the recent paper by Le Gouic and Loubes (2017). Detailed description.
Given the recent work by Ahidar-Coutrix et al. (2018), another interesting question is whether one can establish so-called variance inequalities for the metric J. Such an inequality was shown to guarantee fast convergence rates for certain estimators of barycenters in Ahidar-Coutrix et al. (2018) and would be of great interest from a statistical point of view. Detailed description.
Probably Approximately Correct Learning of Ontologies Based on Description Logics
In description logic knowledge bases, there may be a need to ensure that the terminological part of the base is complete in the sense that it captures all relevant relations between concepts. There is a method for Completing Description Logic Knowledge Bases using Formal Concept Analysis, but it takes exponential time in the worst case. During the internship, you will have to develop a more efficient probably approximately correct (PAC) version of this method. By asking implication questions to a domain expert, the method should be able to approximate the relationships between a given set of concept names that hold in the expert’s model and enrich the knowledge base with the newly discovered relationships. Alternatively (and somewhat more ambitiously), we may choose to study PAC learnability of ontologies based on specific description logics (see Exact Learning of Lightweight Description Logic Ontologies for results concerning exact rather than PAC learning of such ontologies).In description logic knowledge bases, there may be a need to ensure that the terminological part of the base is complete in the sense that it captures all relevant relations between concepts. There is a method for Completing Description Logic Knowledge Bases using Formal Concept Analysis, but it takes exponential time in the worst case. During the internship, you will have to develop a more efficient probably approximately correct (PAC) version of this method. By asking implication questions to a domain expert, the method should be able to approximate the relationships between a given set of concept names that hold in the expert’s model and enrich the knowledge base with the newly discovered relationships. Alternatively (and somewhat more ambitiously), we may choose to study PAC learnability of ontologies based on specific description logics (see Exact Learning of Lightweight Description Logic Ontologies for results concerning exact rather than PAC learning of such ontologies).
Collaborative Conceptual Exploration
Conceptual exploration is a family of knowledge-acquisition techniques based on formal concept analysis (Ganter and Obiedkov 2016). The goal is to build a complete (with respect to a fixed language) implicational theory of a domain by posing queries to the domain expert. When properly implemented, it is a great tool that can help organise the process of scientific discovery. The existing conceptual exploration methods assume a single oracle with a thorough knowledge of the domain. We aim to extend this model to a more practically relevant setting when, in place of a single omniscient and unerring oracle, we have multiple experts who have incomplete or even contradictory knowledge of the domain. Such extensions are known for other active learning models (Donmez and Carbonell 2008, Yan et al. 2011, Chakraborty 2020) but will be novel in conceptual exploration. Adapting exploration procedures to work with several imperfect experts requires modifying the mathematical model, developing new efficient algorithms, defining a strategy for experts’ interaction allowing for conflict resolution between contradictory experts’ opinions and techniques to combine the results of independent work of several expert groups, providing the ability to withdraw previous decisions based on new information (support for nonmonotonic reasoning). As for the joint work of experts, learning methods to be developed fall into two groups. If for a chosen description language, there exists a unique correct domain description (up to semantically equivalent syntactic transformations), there is a need for such methods that, under certain (e.g., probabilistic) assumptions on the completeness and correctness of experts’ knowledge, will return an approximate (in a certain formal sense) domain description. If there is no unique correct description and ontology construction is aimed at establishing consensus between various expert representations of the domain (which may be important, for example, for the humanities), there is a need for methods that can identify the knowledge shared by all experts and highlight positions on which experts disagree. During the internship, we will fix a precise problem statement of learning an implicational theory from queries to multiple experts and will try to adapt the existing learning algorithms to this setting.
Error Detection and Correction in Russian Texts Written by Non-native Speakers
General-purpose spellcheckers are usually designed so as to handle errors made by native speakers. Errors made by non-native speakers, e.g., language learners, are quite different, and they present a serious challenge for automated detection and correction. During the internship, you will have to study various approaches addressing this challenge, such as those described in the paper Grammar Error Correction in Morphologically-Rich Languages: The Case of Russian, and try to reproduce and improve results presented in the literature.
Frequently asked questions
We invite current undergraduate, graduate, and postgraduate students from all over the world. The key requirement is experience in the research area of the internship.
No, you do not need provide such certificate. However, your interview with a potential academic supervisor will be conducted in English.
No, there is not any, all internships are offered free of charge.
Yes, we can cover your travel costs and accommodation.
Cecilia Tosciry (Oxford University)
In Moscow, I worked with two researchers: Professor Fеdor Ratnikov of HSE University, who works on the LHCb experiment at CERN, and Andrey Ustyuzhanin, who is responsible for joint projects of CERN and Yandex. At the beginning of the internship, Andrey Ustyuzhanin and I discussed my project in detail: he asked me about research problems and advised me on relevant articles. This was very useful: I learned about various algorithms for finding similarities between objects. But the trip to Moscow was remembered for more than just research. I was delighted with the local food - it was like going on holiday to a hospitable grandmother's house. All in all, the trip went well, and I even learned a little Russian.
Belhal Karimi (Ecole polytechnique (Université Paris-Saclay)
I really liked the organisation of the research, students and supervisors working on projects together. There were six or ten researchers in the lab every day, and we helped each other informally, sharing the results of experiments. Perhaps I will come to Moscow again: my supervisor at the Polytechnic School often goes to Russia. I enjoyed this trip - especially the Mayakovskaya district, where I lived, and the weekend I spent in St. Petersburg.
Leo Botelle (École Normale Supérieure de Paris)
I wanted to go to Russia for a long time, so I started reading about universities with strong data analysis programmes in Moscow. HSE University turned out to be one of them. At first, when I first arrived there, I was going to research applications of machine learning to build social graphs. The new theme was suggested by Sergey [Kuznetsov]: it turned out to be quite complicated and required a strong mathematical background. However, during the two months I spent in Moscow, I was able to sharpen my skills, which will be useful in my subsequent research.
Diego Granziol (Oxford University)
I cannot stress enough how lucky I was and what an honour it was to come to Russia, to work with the whole Bayesian Methods Research Group. The atmosphere was warm and welcoming. Timur [Garilov] is a truly amazing coder, and without him I would not have moved forward on any of my own ideas. I think he has the potential to do truly quite exceptional research, and I'm incredibly happy to follow his progress. I thank Dmitry [Vetrov], whose final contribution to the paper we sent to NeurIPS was absolutely essential, for our regular meetings, advice, questions, support and time.