• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

The Faculty of Computer Science invites international students to participate in research internships. The duration of the internship is two to six months.

About the Faculty

The Faculty of Computer Science has twelve laboratories engaged in research in theoretical computer science, big data, optimization methods, machine learning, computer vision, software engineering, bioinformatics, and topology. The Faculty cooperates with the most importamt Russian and internations research institutions, including the Russian Academy of Sciences, CERN, Samsung, and Yandex. Scientists from all over the world take part in the work of the laboratories and teaching at the Faculty. Regular conferences, schools, laboratory seminars, and the Colloquium of the Faculty are held.

Requirements

  • Being an undergraduate, graduate, or postgraduate student
  • Experience in the research area of the chosen internship
  • Good command of English (Russian is not required)

Application procedure

Your application should include:

  • CV
  • Cover letter
  • University transcript with your grades
  • The name of the project you’re interested in
  • English proficiency certificate (not obligatory)

On successful completion of the initial selection, the candidate is invited for an interview with a potential academic supervisor. The interview is conducted in English.

Apply

Available projects

Laboratory Academic supervisor Project name Required level
of studies
Duration and mode Prerequisites

PAIS Lab

Alexey Mitsyuk

Process Simulation and Log Generation

MSc

4-5 months

Online/Offline

Foundations of computer science; Petri nets and other models of concurrency; strong programming skills (Java/Kotlin).

Laboratory of Complex Systems Modeling and Control

Sasha Shapoval

Universality Classes in Sandpiles

BSc, MSc, PhD, postdoc

2-5 months

Online/Offline

N/A

Laboratory of Algebraic Topology and its Applications

Anton Ayzenberg

Project 1. Coordinates and singularities of misorientation spaces

Project 2. TDA and FCA in experiments with neurons

BSc, MSc, PhD

2-5 months Online/Offline

Project 1. Low-dimensional topology (the candidate should understand the basic concepts: homeomorphism, smooth manifold, group action on a topological space, fundamental group) and either the basics of invariant theory for finite group action, or the basics programming skills in Python.

Project 2. Basics of topological data analysis (understanding homotopy equivalence, Nerve lemma, persistent homology). Strong background in discrete mathematics: you have an understanding about partially ordered sets. Programming skills in Python.

HDI Lab

Quentin Paris

Geometry of the Jensen-Shannon metric:

Topic 1. Geometry of J

Topic 2. Barycenters

Topic 3. Variance inequalities

BSc, MSc, PhD

TBA

N/A

School of Data Analysis and Artificial Intelligence

Sergei Obiedkov

Project 1. Probably Approximately Correct Learning of Ontologies Based on Description Logics

Project 2. Collaborative Conceptual Exploration

Project 3. Error Detection and Correction in Russian Texts Written by Non-native Speakers

BSc, MSc, PhD, postdoc

These projects will commence at the beginning of the 2021-22 academic year

3-5 months

Online/Offline

Project 1. Basic knowledge of mathematical logic (preferably, with some background in description logics), probability theory, and algorithm analysis.

 

 

 

Project 2. Basic knowledge of propositional logic, probability theory, and algorithm analysis.

 

Project 3. Some familiarity with machine learning and NLP methods and tools is desirable.

Process Simulation and Log Generation

Gena is a simple tool that allows to generate event logs by executing process models. It is useful in the testing and evaluation of process mining algorithms. Gena supports two important notations used in business process modelling and process mining: Petri nets and basic BPMN models. However, many improvements can be made in Gena to support more sophisticated simulation scenarios. Interns have an opportunity to join this project.

Process Mining Using Neural Networks

Process mining [1] is a modern research discipline that aims at improving (mostly business) processes on the base of so-called event logs. The event logs are created by information system supporting processes (process-aware information systems, PAIS) as a technical by-product of their life. With that along, event logs are a source of valuable information about the real lifeline of processes. Thus, event logs can be used to extract process models (the discovery task), to check the real behaviour of a process against its prescribed model (the conformance checking task), and, ultimately, to improve processes (the process enhancement task).

By now, there are a large number of algorithms, methods and techniques for extracting and analyzing process models. They are mostly based on pure algorithmic or mixed statistical approaches. However, only a very few works use neural networks-based approaches to process event data. For example, see [2].

The use of neural networks and deep machine learning for solving process mining problems seem to be very productive, especially from the point of view of working with large event logs that one has to deal with in practice. The project is dedicated to elaborate existing process mining algorithms and develop new ones, as well as methods and tools using neural networks.


[1] Wil van der Aalst. Process Mining: Data Science in Action. 2nd ed. Springer, 2016.

[2] T. Shunin, N. Zubkova, and S. Shershakov. Neural Approach to the Discovery Problem in Process Mining. In Proceedings: AIST 2018 Conference, LNCS 11179, pp. 261–273, Springer, 2018.

Universality Classes in Sandpiles

The project deals with self-organized critical systems and the prediction of rare events generated by them. Self-organized criticality (SOC) is prescribed to various observed systems and processes related to physics and economics. The first model with SOC was defined by Bak, Tang, and Wiesenfeld in 1987. They introduced a mechanism (the BTW mechanism) which balances two multi-scale processes: a constant slow loading and a quick stress-release. This mechanism results in power-law probability distributions that characterize the critical state but lacks parameters to fit the model regularities to observations.

The planned contribution of this project is twofold. First, we intend to establish that the BTW mechanism can be designed on a square lattice in such a way that the power-law exponent becomes adjustable. This contribution will extend the current picture of sand-pile models with two universality classes realized in various settings. The extension is based on the idea that several processes of stress propagation should be united into a single event if they occur next to one another in space and time. The second direction of this project pursues the prediction of extremes. We plan to address the prediction problem in sand-piles, exposing the predictability of events that are so large that they occur only in the supercritical state. In other words, extra system loading is required to make the system generate extremes. This natural property underlies predictability. The research has to reconcile general claims regarding unpredictability in sand-piles, which may describe typical events and some examples of efficient predictions related to large events.

Project 1. Coordinates and singularities of misorientation spaces

A misorientation is a measure of displacement of a crystal lattice relative to another lattice. The space of all misorientations is a two-sided identification space G1\SO(3)/G2 where G1,G2 are point crystallography groups of lattices. It can be seen that misorientation space is a 3-dimensional orbifold. The underlying topological space of this orbifold depends on the crystallography groups. However, surprisingly often this space is homeomorphic to the 3-sphere according to Poincare conjecture. During the internship, you can do one of the two tasks (or both). (1) For particular choices of crystallography groups, construct the convenient coordinates on the corresponding misorientation spaces (in particular, you can try to prove the homeomorphism to 3-sphere, avoiding the reference to Poincare conjecture). (2) As a 3-orbifold each misorientation space has a 3-valent weighted graph as its orbifold locus or singularity. The task is to describe the orbifold singularities for all pairs of point crystallography groups. This problem can be approached either by visualizing things in Python or by careful examination of some known papers in invariant theory and low-dimensional topology.

Project 2. TDA and FCA in experiments with neurons

There is a famous experiment confirming the existence of the so-called place cells in mammals hippocampus. In this experiment, a mouse moves freely in a labyrinth, while the activity of some set of its neurons is recorded. The general goal is to understand neurons' activation patterns and their relation to the shape of the labyrinth, the trajectory of the rodent, and so on. Generally, one can reconstruct the topology of the labyrinth from neural activity, based on Nerve lemma. Instead of taking the nerves of the covers, one can try to find implications between different neurons, by constructing the lattice of formal concepts, relating the neural activity with the physical position of a mouse. Then one will be able to study the topology of this lattice instead of the nerve. During the internship, you will join our research team in this subject to work on the data obtained in real experiments. Your particular task will be to understand the theoretical basics of formal concept analysis and make a review of the packages which can be combined with the existing packages in TDA in order to proceed in the task described above. As a minimum, we plan that you can describe some basic implications between the activities of different groups of neurons.

Geometry of J

The first goal of this project is to further understand the geometric properties of this metric. For instance, one objective would be to study whether the set P(R^d), endowed with J, is a geodesic metric space, characterize its shortest paths and eventually its curvature properties. Detailed description.

Barycenters

A related field of investigation is to study barycenters relative to metric J. One natural question is whether such barycenters exist, are unique and if one can estimate them in a consistent way. These questions could be studied in the light of the recent paper by Le Gouic and Loubes (2017). Detailed description.

Variance inequalities

Given the recent work by Ahidar-Coutrix et al. (2018), another interesting question is whether one can establish so-called variance inequalities for the metric J. Such an inequality was shown to guarantee fast convergence rates for certain estimators of barycenters in Ahidar-Coutrix et al. (2018) and would be of great interest from a statistical point of view. Detailed description.

Probably Approximately Correct Learning of Ontologies Based on Description Logics

In description logic knowledge bases, there may be a need to ensure that the terminological part of the base is complete in the sense that it captures all relevant relations between concepts. There is a method for Completing Description Logic Knowledge Bases using Formal Concept Analysis, but it takes exponential time in the worst case. During the internship, you will have to develop a more efficient probably approximately correct (PAC) version of this method. By asking implication questions to a domain expert, the method should be able to approximate the relationships between a given set of concept names that hold in the expert’s model and enrich the knowledge base with the newly discovered relationships. Alternatively (and somewhat more ambitiously), we may choose to study PAC learnability of ontologies based on specific description logics (see Exact Learning of Lightweight Description Logic Ontologies for results concerning exact rather than PAC learning of such ontologies).In description logic knowledge bases, there may be a need to ensure that the terminological part of the base is complete in the sense that it captures all relevant relations between concepts. There is a method for Completing Description Logic Knowledge Bases using Formal Concept Analysis, but it takes exponential time in the worst case. During the internship, you will have to develop a more efficient probably approximately correct (PAC) version of this method. By asking implication questions to a domain expert, the method should be able to approximate the relationships between a given set of concept names that hold in the expert’s model and enrich the knowledge base with the newly discovered relationships. Alternatively (and somewhat more ambitiously), we may choose to study PAC learnability of ontologies based on specific description logics (see Exact Learning of Lightweight Description Logic Ontologies for results concerning exact rather than PAC learning of such ontologies).

Collaborative Conceptual Exploration

Conceptual exploration is a family of knowledge-acquisition techniques based on formal concept analysis (Ganter and Obiedkov 2016). The goal is to build a complete (with respect to a fixed language) implicational theory of a domain by posing queries to the domain expert. When properly implemented, it is a great tool that can help organise the process of scientific discovery. The existing conceptual exploration methods assume a single oracle with a thorough knowledge of the domain. We aim to extend this model to a more practically relevant setting when, in place of a single omniscient and unerring oracle, we have multiple experts who have incomplete or even contradictory knowledge of the domain. Such extensions are known for other active learning models (Donmez and Carbonell 2008, Yan et al. 2011, Chakraborty 2020) but will be novel in conceptual exploration. Adapting exploration procedures to work with several imperfect experts requires modifying the mathematical model, developing new efficient algorithms, defining a strategy for experts’ interaction allowing for conflict resolution between contradictory experts’ opinions and techniques to combine the results of independent work of several expert groups, providing the ability to withdraw previous decisions based on new information (support for nonmonotonic reasoning). As for the joint work of experts, learning methods to be developed fall into two groups. If for a chosen description language, there exists a unique correct domain description (up to semantically equivalent syntactic transformations), there is a need for such methods that, under certain (e.g., probabilistic) assumptions on the completeness and correctness of experts’ knowledge, will return an approximate (in a certain formal sense) domain description. If there is no unique correct description and ontology construction is aimed at establishing consensus between various expert representations of the domain (which may be important, for example, for the humanities), there is a need for methods that can identify the knowledge shared by all experts and highlight positions on which experts disagree. During the internship, we will fix a precise problem statement of learning an implicational theory from queries to multiple experts and will try to adapt the existing learning algorithms to this setting.

Error Detection and Correction in Russian Texts Written by Non-native Speakers

General-purpose spellcheckers are usually designed so as to handle errors made by native speakers. Errors made by non-native speakers, e.g., language learners, are quite different, and they present a serious challenge for automated detection and correction. During the internship, you will have to study various approaches addressing this challenge, such as those described in the paper Grammar Error Correction in Morphologically-Rich Languages: The Case of Russian, and try to reproduce and improve results presented in the literature.

 

About HSE University

Consistently ranked as one of Russia’s top universities, HSE University is a leader in Russian education and one of the preeminent universities in eastern Europe and Eurasia. Having rapidly grown into a well-renowned research university over two decades, HSE University sets itself apart with its international presence and cooperation.

In March 2014 HSE University together with Yandex, a major Russian IT company, opened its new Faculty of Computer Science. The Faculty aims at preparing highly qualified data scientists, software engineers, and computer science researchers for leading Russian and international IT companies and academic institutions.

Frequently asked questions

Who can participate?

We invite current undergraduate, graduate, and postgraduate students from all over the world. The key requirement is experience in the research area of the internship.

Do we need to provide an official English language certificate?

No, you do not need provide such certificate. However, your interview with a potential academic supervisor will be conducted in English.

Is there a registration fee?

No, there is not any, all internships are offered free of charge.

Is there any scholarship or financial support?

Yes, we can cover your travel costs and accommodation.

Past participants

Students from Oxford University (UK), École Normale Supérieure de Paris (France), Università degli Studi di Padova (Italy), Université de Toulouse (France), Instituto Superior Técnico (Portugal), École Centrale Supérieure de Marseille (France), INSA Lyon (France) have participated in our internships.

Contacts

Sergey Karapetyan

skarapetyan@hse.ru
+7 (495) 531-00-00 *27344