The Faculty of Computer Science invites international students to participate in research internships. The duration of the internship is two to six months.
About the Faculty
The Faculty of Computer Science has twelve laboratories engaged in research in theoretical computer science, big data, optimization methods, machine learning, computer vision, software engineering, bioinformatics, and topology. The Faculty cooperates with the most importamt Russian and internations research institutions, including the Russian Academy of Sciences, CERN, Samsung, and Yandex. Scientists from all over the world take part in the work of the laboratories and teaching at the Faculty. Regular conferences, schools, laboratory seminars, and the Colloquium of the Faculty are held.
Requirements
 Being an undergraduate, graduate, or postgraduate student
 Experience in the research area of the chosen internship
 Good command of English (Russian is not required)
Application procedure
Your application should include:
 CV
 Cover letter
 University transcript with your grades
 The name of the project you’re interested in
 English proficiency certificate (not obligatory)
On successful completion of the initial selection, the candidate is invited for an interview with a potential academic supervisor. The interview is conducted in English.
Available projects
Laboratory  Academic supervisor  Project name  Required level of studies 
Duration and mode  Prerequisites 

Alexey Mitsyuk 
MSc 
45 months Online/Offline 
Foundations of computer science; Petri nets and other models of concurrency; strong programming skills (Java/Kotlin). 

Sasha Shapoval 
BSc, MSc, PhD, postdoc 
25 months Online/Offline 
N/A 

Anton Ayzenberg 
Project 1. Coordinates and singularities of misorientation spaces 
BSc, MSc, PhD 
25 months Online/Offline 
Project 1. Lowdimensional topology (the candidate should understand the basic concepts: homeomorphism, smooth manifold, group action on a topological space, fundamental group) and either the basics of invariant theory for finite group action, or the basics programming skills in Python. Project 2. Basics of topological data analysis (understanding homotopy equivalence, Nerve lemma, persistent homology). Strong background in discrete mathematics: you have an understanding about partially ordered sets. Programming skills in Python. 

Quentin Paris 
Geometry of the JensenShannon metric: 
BSc, MSc, PhD 
TBA 
N/A 

Sergei Obiedkov 
Project 1. Probably Approximately Correct Learning of Ontologies Based on Description Logics Project 2. Collaborative Conceptual Exploration Project 3. Error Detection and Correction in Russian Texts Written by Nonnative Speakers 
BSc, MSc, PhD, postdoc 
These projects will commence at the beginning of the 202122 academic year 35 months Online/Offline 
Project 1. Basic knowledge of mathematical logic (preferably, with some background in description logics), probability theory, and algorithm analysis.
Project 2. Basic knowledge of propositional logic, probability theory, and algorithm analysis.
Project 3. Some familiarity with machine learning and NLP methods and tools is desirable. 
Process Simulation and Log Generation
Gena is a simple tool that allows to generate event logs by executing process models. It is useful in the testing and evaluation of process mining algorithms. Gena supports two important notations used in business process modelling and process mining: Petri nets and basic BPMN models. However, many improvements can be made in Gena to support more sophisticated simulation scenarios. Interns have an opportunity to join this project.
Process Mining Using Neural Networks
Process mining [1] is a modern research discipline that aims at improving (mostly business) processes on the base of socalled event logs. The event logs are created by information system supporting processes (processaware information systems, PAIS) as a technical byproduct of their life. With that along, event logs are a source of valuable information about the real lifeline of processes. Thus, event logs can be used to extract process models (the discovery task), to check the real behaviour of a process against its prescribed model (the conformance checking task), and, ultimately, to improve processes (the process enhancement task).
By now, there are a large number of algorithms, methods and techniques for extracting and analyzing process models. They are mostly based on pure algorithmic or mixed statistical approaches. However, only a very few works use neural networksbased approaches to process event data. For example, see [2].
The use of neural networks and deep machine learning for solving process mining problems seem to be very productive, especially from the point of view of working with large event logs that one has to deal with in practice. The project is dedicated to elaborate existing process mining algorithms and develop new ones, as well as methods and tools using neural networks.
[1] Wil van der Aalst. Process Mining: Data Science in Action. 2nd ed. Springer, 2016.
[2] T. Shunin, N. Zubkova, and S. Shershakov. Neural Approach to the Discovery Problem in Process Mining. In Proceedings: AIST 2018 Conference, LNCS 11179, pp. 261–273, Springer, 2018.
Universality Classes in Sandpiles
The project deals with selforganized critical systems and the prediction of rare events generated by them. Selforganized criticality (SOC) is prescribed to various observed systems and processes related to physics and economics. The first model with SOC was defined by Bak, Tang, and Wiesenfeld in 1987. They introduced a mechanism (the BTW mechanism) which balances two multiscale processes: a constant slow loading and a quick stressrelease. This mechanism results in powerlaw probability distributions that characterize the critical state but lacks parameters to fit the model regularities to observations.
The planned contribution of this project is twofold. First, we intend to establish that the BTW mechanism can be designed on a square lattice in such a way that the powerlaw exponent becomes adjustable. This contribution will extend the current picture of sandpile models with two universality classes realized in various settings. The extension is based on the idea that several processes of stress propagation should be united into a single event if they occur next to one another in space and time. The second direction of this project pursues the prediction of extremes. We plan to address the prediction problem in sandpiles, exposing the predictability of events that are so large that they occur only in the supercritical state. In other words, extra system loading is required to make the system generate extremes. This natural property underlies predictability. The research has to reconcile general claims regarding unpredictability in sandpiles, which may describe typical events and some examples of efficient predictions related to large events.
Project 1. Coordinates and singularities of misorientation spaces
A misorientation is a measure of displacement of a crystal lattice relative to another lattice. The space of all misorientations is a twosided identification space G1\SO(3)/G2 where G1,G2 are point crystallography groups of lattices. It can be seen that misorientation space is a 3dimensional orbifold. The underlying topological space of this orbifold depends on the crystallography groups. However, surprisingly often this space is homeomorphic to the 3sphere according to Poincare conjecture. During the internship, you can do one of the two tasks (or both). (1) For particular choices of crystallography groups, construct the convenient coordinates on the corresponding misorientation spaces (in particular, you can try to prove the homeomorphism to 3sphere, avoiding the reference to Poincare conjecture). (2) As a 3orbifold each misorientation space has a 3valent weighted graph as its orbifold locus or singularity. The task is to describe the orbifold singularities for all pairs of point crystallography groups. This problem can be approached either by visualizing things in Python or by careful examination of some known papers in invariant theory and lowdimensional topology.
Project 2. TDA and FCA in experiments with neurons
There is a famous experiment confirming the existence of the socalled place cells in mammals hippocampus. In this experiment, a mouse moves freely in a labyrinth, while the activity of some set of its neurons is recorded. The general goal is to understand neurons' activation patterns and their relation to the shape of the labyrinth, the trajectory of the rodent, and so on. Generally, one can reconstruct the topology of the labyrinth from neural activity, based on Nerve lemma. Instead of taking the nerves of the covers, one can try to find implications between different neurons, by constructing the lattice of formal concepts, relating the neural activity with the physical position of a mouse. Then one will be able to study the topology of this lattice instead of the nerve. During the internship, you will join our research team in this subject to work on the data obtained in real experiments. Your particular task will be to understand the theoretical basics of formal concept analysis and make a review of the packages which can be combined with the existing packages in TDA in order to proceed in the task described above. As a minimum, we plan that you can describe some basic implications between the activities of different groups of neurons.
Geometry of J
The first goal of this project is to further understand the geometric properties of this metric. For instance, one objective would be to study whether the set P(R^d), endowed with J, is a geodesic metric space, characterize its shortest paths and eventually its curvature properties. Detailed description.
Barycenters
A related field of investigation is to study barycenters relative to metric J. One natural question is whether such barycenters exist, are unique and if one can estimate them in a consistent way. These questions could be studied in the light of the recent paper by Le Gouic and Loubes (2017). Detailed description.
Variance inequalities
Given the recent work by AhidarCoutrix et al. (2018), another interesting question is whether one can establish socalled variance inequalities for the metric J. Such an inequality was shown to guarantee fast convergence rates for certain estimators of barycenters in AhidarCoutrix et al. (2018) and would be of great interest from a statistical point of view. Detailed description.
Probably Approximately Correct Learning of Ontologies Based on Description Logics
In description logic knowledge bases, there may be a need to ensure that the terminological part of the base is complete in the sense that it captures all relevant relations between concepts. There is a method for Completing Description Logic Knowledge Bases using Formal Concept Analysis, but it takes exponential time in the worst case. During the internship, you will have to develop a more efficient probably approximately correct (PAC) version of this method. By asking implication questions to a domain expert, the method should be able to approximate the relationships between a given set of concept names that hold in the expert’s model and enrich the knowledge base with the newly discovered relationships. Alternatively (and somewhat more ambitiously), we may choose to study PAC learnability of ontologies based on specific description logics (see Exact Learning of Lightweight Description Logic Ontologies for results concerning exact rather than PAC learning of such ontologies).In description logic knowledge bases, there may be a need to ensure that the terminological part of the base is complete in the sense that it captures all relevant relations between concepts. There is a method for Completing Description Logic Knowledge Bases using Formal Concept Analysis, but it takes exponential time in the worst case. During the internship, you will have to develop a more efficient probably approximately correct (PAC) version of this method. By asking implication questions to a domain expert, the method should be able to approximate the relationships between a given set of concept names that hold in the expert’s model and enrich the knowledge base with the newly discovered relationships. Alternatively (and somewhat more ambitiously), we may choose to study PAC learnability of ontologies based on specific description logics (see Exact Learning of Lightweight Description Logic Ontologies for results concerning exact rather than PAC learning of such ontologies).
Collaborative Conceptual Exploration
Conceptual exploration is a family of knowledgeacquisition techniques based on formal concept analysis (Ganter and Obiedkov 2016). The goal is to build a complete (with respect to a fixed language) implicational theory of a domain by posing queries to the domain expert. When properly implemented, it is a great tool that can help organise the process of scientific discovery. The existing conceptual exploration methods assume a single oracle with a thorough knowledge of the domain. We aim to extend this model to a more practically relevant setting when, in place of a single omniscient and unerring oracle, we have multiple experts who have incomplete or even contradictory knowledge of the domain. Such extensions are known for other active learning models (Donmez and Carbonell 2008, Yan et al. 2011, Chakraborty 2020) but will be novel in conceptual exploration. Adapting exploration procedures to work with several imperfect experts requires modifying the mathematical model, developing new efficient algorithms, defining a strategy for experts’ interaction allowing for conflict resolution between contradictory experts’ opinions and techniques to combine the results of independent work of several expert groups, providing the ability to withdraw previous decisions based on new information (support for nonmonotonic reasoning). As for the joint work of experts, learning methods to be developed fall into two groups. If for a chosen description language, there exists a unique correct domain description (up to semantically equivalent syntactic transformations), there is a need for such methods that, under certain (e.g., probabilistic) assumptions on the completeness and correctness of experts’ knowledge, will return an approximate (in a certain formal sense) domain description. If there is no unique correct description and ontology construction is aimed at establishing consensus between various expert representations of the domain (which may be important, for example, for the humanities), there is a need for methods that can identify the knowledge shared by all experts and highlight positions on which experts disagree. During the internship, we will fix a precise problem statement of learning an implicational theory from queries to multiple experts and will try to adapt the existing learning algorithms to this setting.
Error Detection and Correction in Russian Texts Written by Nonnative Speakers
Generalpurpose spellcheckers are usually designed so as to handle errors made by native speakers. Errors made by nonnative speakers, e.g., language learners, are quite different, and they present a serious challenge for automated detection and correction. During the internship, you will have to study various approaches addressing this challenge, such as those described in the paper Grammar Error Correction in MorphologicallyRich Languages: The Case of Russian, and try to reproduce and improve results presented in the literature.
About HSE University
Consistently ranked as one of Russia’s top universities, HSE University is a leader in Russian education and one of the preeminent universities in eastern Europe and Eurasia. Having rapidly grown into a wellrenowned research university over two decades, HSE University sets itself apart with its international presence and cooperation.
In March 2014 HSE University together with Yandex, a major Russian IT company, opened its new Faculty of Computer Science. The Faculty aims at preparing highly qualified data scientists, software engineers, and computer science researchers for leading Russian and international IT companies and academic institutions.
Frequently asked questions
We invite current undergraduate, graduate, and postgraduate students from all over the world. The key requirement is experience in the research area of the internship.
No, you do not need provide such certificate. However, your interview with a potential academic supervisor will be conducted in English.
No, there is not any, all internships are offered free of charge.
Yes, we can cover your travel costs and accommodation.
Past participants
Students from Oxford University (UK), École Normale Supérieure de Paris (France), Università degli Studi di Padova (Italy), Université de Toulouse (France), Instituto Superior Técnico (Portugal), École Centrale Supérieure de Marseille (France), INSA Lyon (France) have participated in our internships.
Cecilia Tosciry (Oxford University)
In Moscow, I worked with two researchers: Professor Fеdor Ratnikov of HSE University, who works on the LHCb experiment at CERN, and Andrey Ustyuzhanin, who is responsible for joint projects of CERN and Yandex. At the beginning of the internship, Andrey Ustyuzhanin and I discussed my project in detail: he asked me about research problems and advised me on relevant articles. This was very useful: I learned about various algorithms for finding similarities between objects. But the trip to Moscow was remembered for more than just research. I was delighted with the local food  it was like going on holiday to a hospitable grandmother's house. All in all, the trip went well, and I even learned a little Russian.
Belhal Karimi (Ecole polytechnique (Université ParisSaclay)
I really liked the organisation of the research, students and supervisors working on projects together. There were six or ten researchers in the lab every day, and we helped each other informally, sharing the results of experiments. Perhaps I will come to Moscow again: my supervisor at the Polytechnic School often goes to Russia. I enjoyed this trip  especially the Mayakovskaya district, where I lived, and the weekend I spent in St. Petersburg.
Leo Botelle (École Normale Supérieure de Paris)
I wanted to go to Russia for a long time, so I started reading about universities with strong data analysis programmes in Moscow. HSE University turned out to be one of them. At first, when I first arrived there, I was going to research applications of machine learning to build social graphs. The new theme was suggested by Sergey [Kuznetsov]: it turned out to be quite complicated and required a strong mathematical background. However, during the two months I spent in Moscow, I was able to sharpen my skills, which will be useful in my subsequent research.
Diego Granziol (Oxford University)
I cannot stress enough how lucky I was and what an honour it was to come to Russia, to work with the whole Bayesian Methods Research Group. The atmosphere was warm and welcoming. Timur [Garilov] is a truly amazing coder, and without him I would not have moved forward on any of my own ideas. I think he has the potential to do truly quite exceptional research, and I'm incredibly happy to follow his progress. I thank Dmitry [Vetrov], whose final contribution to the paper we sent to NeurIPS was absolutely essential, for our regular meetings, advice, questions, support and time.
Contacts
skarapetyan@hse.ru
+7 (495) 5310000 *27344