• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта

Факультет компьютерных наук приглашает иностранных студентов пройти исследовательские стажировки сроком от двух до шести месяцев.

О факультете

На факультете компьютерных наук действует двенадцать лабораторий, занимающихся исследованиями в таких областях как теоретическая информатика, большие данные, методы оптимизации, машинное обучение, компьютерное зрение, программная инженерия, биоинформатика, топология. Ведется сотрудничество с крупнейшими российскими и мировыми научными организациями, в числе которых Российская академия наук, ЦЕРН, Samsung, Яндекс. В работе лабораторий принимают участие ученые со всего мира. Регулярно проводятся конференции, школы, ведут работу постоянные семинары лабораторий и коллоквиум ФКН.

Требования к кандидатам

  • Обучение в бакалавриате, магистратуре или аспирантуре
  • Опыт в выбранном направлении стажировки
  • Знание английского языка (знание русского не является обязательным)

Порядок оформления заявки

Заявка должна включать в себя:

  • CV
  • Сопроводительное письмо
  • Транскрипт с оценками
  • Название интересующего проекта
  • Языковой сертификат (не является обязательным)

При успешном прохождения первичного отбора, кандидат приглашается на интервью с потенциальным научным руководителем. Интервью проводится на английском языке.

Подать заявку

Список доступных стажировок

Laboratory Academic supervisor Project name Required level
of studies
Duration and mode Prerequisites

PAIS Lab

Alexey Mitsyuk

Process Simulation and Log Generation

MSc

4-5 months

Online/Offline

Foundations of computer science; Petri nets and other models of concurrency; strong programming skills (Java/Kotlin).

Laboratory of Complex Systems Modeling and Control

Sasha Shapoval

Universality Classes in Sandpiles

BSc, MSc, PhD, postdoc

2-5 months

Online/Offline

N/A

Laboratory of Algebraic Topology and its Applications

Anton Ayzenberg

Project 1. Coordinates and singularities of misorientation spaces

Project 2. TDA and FCA in experiments with neurons

BSc, MSc, PhD

2-5 months

Online/Offline

Project 1. Low-dimensional topology (the candidate should understand the basic concepts: homeomorphism, smooth manifold, group action on a topological space, fundamental group) and either the basics of invariant theory for finite group action, or the basics programming skills in Python.

Project 2. Basics of topological data analysis (understanding homotopy equivalence, Nerve lemma, persistent homology). Strong background in discrete mathematics: you have an understanding about partially ordered sets. Programming skills in Python.

HDI Lab

Quentin Paris

Geometry of the Jensen-Shannon metric:

Topic 1. Geometry of J

Topic 2. Barycenters

Topic 3. Variance inequalities

BSc, MSc, PhD

TBA

N/A

School of Data Analysis and Artificial Intelligence

Sergei Obiedkov

Project 1. Probably Approximately Correct Learning of Ontologies Based on Description Logics

Project 2. Collaborative Conceptual Exploration

Project 3. Error Detection and Correction in Russian Texts Written by Non-native Speakers

BSc, MSc, PhD, postdoc

3-5 months

Online/Offline

Project 1. Basic knowledge of mathematical logic (preferably, with some background in description logics), probability theory, and algorithm analysis.

 

 

 

Project 2. Basic knowledge of propositional logic, probability theory, and algorithm analysis.

 

Project 3. Some familiarity with machine learning and NLP methods and tools is desirable.

Process Simulation and Log Generation

Gena is a simple tool that allows to generate event logs by executing process models. It is useful in the testing and evaluation of process mining algorithms. Gena supports two important notations used in business process modelling and process mining: Petri nets and basic BPMN models. However, many improvements can be made in Gena to support more sophisticated simulation scenarios. Interns have an opportunity to join this project.

Process Mining Using Neural Networks

Process mining [1] is a modern research discipline that aims at improving (mostly business) processes on the base of so-called event logs. The event logs are created by information system supporting processes (process-aware information systems, PAIS) as a technical by-product of their life. With that along, event logs are a source of valuable information about the real lifeline of processes. Thus, event logs can be used to extract process models (the discovery task), to check the real behaviour of a process against its prescribed model (the conformance checking task), and, ultimately, to improve processes (the process enhancement task).

By now, there are a large number of algorithms, methods and techniques for extracting and analyzing process models. They are mostly based on pure algorithmic or mixed statistical approaches. However, only a very few works use neural networks-based approaches to process event data. For example, see [2].

The use of neural networks and deep machine learning for solving process mining problems seem to be very productive, especially from the point of view of working with large event logs that one has to deal with in practice. The project is dedicated to elaborate existing process mining algorithms and develop new ones, as well as methods and tools using neural networks.


[1] Wil van der Aalst. Process Mining: Data Science in Action. 2nd ed. Springer, 2016.

[2] T. Shunin, N. Zubkova, and S. Shershakov. Neural Approach to the Discovery Problem in Process Mining. In Proceedings: AIST 2018 Conference, LNCS 11179, pp. 261–273, Springer, 2018.

Universality Classes in Sandpiles

The project deals with self-organized critical systems and the prediction of rare events generated by them. Self-organized criticality (SOC) is prescribed to various observed systems and processes related to physics and economics. The first model with SOC was defined by Bak, Tang, and Wiesenfeld in 1987. They introduced a mechanism (the BTW mechanism) which balances two multi-scale processes: a constant slow loading and a quick stress-release. This mechanism results in power-law probability distributions that characterize the critical state but lacks parameters to fit the model regularities to observations.

The planned contribution of this project is twofold. First, we intend to establish that the BTW mechanism can be designed on a square lattice in such a way that the power-law exponent becomes adjustable. This contribution will extend the current picture of sand-pile models with two universality classes realized in various settings. The extension is based on the idea that several processes of stress propagation should be united into a single event if they occur next to one another in space and time. The second direction of this project pursues the prediction of extremes. We plan to address the prediction problem in sand-piles, exposing the predictability of events that are so large that they occur only in the supercritical state. In other words, extra system loading is required to make the system generate extremes. This natural property underlies predictability. The research has to reconcile general claims regarding unpredictability in sand-piles, which may describe typical events and some examples of efficient predictions related to large events.

Project 1. Coordinates and singularities of misorientation spaces

A misorientation is a measure of the displacement of a crystal lattice relative to another lattice. The space of all misorientations is a two-sided identification space G1\SO(3)/G2 where G1,G2 are point crystallography groups of lattices. It can be seen that misorientation space is a 3-dimensional orbifold. The underlying topological space of this orbifold depends on the crystallography groups. However, surprisingly often this space is homeomorphic to the 3-sphere according to Poincare conjecture. During the internship, you can do one of the two tasks (or both). (1) For particular choices of crystallography groups, construct the convenient coordinates on the corresponding misorientation spaces (in particular, you can try to prove the homeomorphism to 3-sphere, avoiding the reference to Poincare conjecture). (2) As a 3-orbifold each misorientation space has a 3-valent weighted graph as its orbifold locus or singularity. The task is to describe the orbifold singularities for all pairs of point crystallography groups. This problem can be approached either by visualizing things in Python or by careful examination of some known papers in invariant theory and low-dimensional topology.

Project 2. TDA and FCA in experiments with neurons

There is a famous experiment confirming the existence of the so-called place cells in mammals hippocampus. In this experiment, a mouse moves freely in a labyrinth, while the activity of some set of its neurons is recorded. The general goal is to understand neurons' activation patterns and their relation to the shape of the labyrinth, the trajectory of the rodent, and so on. Generally, one can reconstruct the topology of the labyrinth from neural activity, based on Nerve lemma. Instead of taking the nerves of the covers, one can try to find implications between different neurons, by constructing the lattice of formal concepts, relating the neural activity with the physical position of a mouse. Then one will be able to study the topology of this lattice instead of the nerve. During the internship, you will join our research team in this subject to work on the data obtained in real experiments. Your particular task will be to understand the theoretical basics of formal concept analysis and make a review of the packages which can be combined with the existing packages in TDA in order to proceed in the task described above. As a minimum, we plan that you can describe some basic implications between the activities of different groups of neurons.

Geometry of J

The first goal of this project is to further understand the geometric properties of this metric. For instance, one objective would be to study whether the set P(R^d), endowed with J, is a geodesic metric space, characterize its shortest paths and eventually its curvature properties. Detailed description.

Barycenters

A related field of investigation is to study barycenters relative to metric J. One natural question is whether such barycenters exist, are unique and if one can estimate them in a consistent way. These questions could be studied in the light of the recent paper by Le Gouic and Loubes (2017). Detailed description.

Variance inequalities

Given the recent work by Ahidar-Coutrix et al. (2018), another interesting question is whether one can establish so-called variance inequalities for the metric J. Such inequality was shown to guarantee fast convergence rates for certain estimators of barycenters in Ahidar-Coutrix et al. (2018) and would be of great interest from a statistical point of view. Detailed description.

Probably Approximately Correct Learning of Ontologies Based on Description Logics

In description logic knowledge bases, there may be a need to ensure that the terminological part of the base is complete in the sense that it captures all relevant relations between concepts. There is a method for Completing Description Logic Knowledge Bases using Formal Concept Analysis, but it takes exponential time in the worst case. During the internship, you will have to develop a more efficient probably approximately correct (PAC) version of this method. By asking implication questions to a domain expert, the method should be able to approximate the relationships between a given set of concept names that hold in the expert’s model and enrich the knowledge base with the newly discovered relationships. Alternatively (and somewhat more ambitiously), we may choose to study PAC learnability of ontologies based on specific description logics (see Exact Learning of Lightweight Description Logic Ontologies for results concerning exact rather than PAC learning of such ontologies).In description logic knowledge bases, there may be a need to ensure that the terminological part of the base is complete in the sense that it captures all relevant relations between concepts. There is a method for Completing Description Logic Knowledge Bases using Formal Concept Analysis, but it takes exponential time in the worst case. During the internship, you will have to develop a more efficient probably approximately correct (PAC) version of this method. By asking implication questions to a domain expert, the method should be able to approximate the relationships between a given set of concept names that hold in the expert’s model and enrich the knowledge base with the newly discovered relationships. Alternatively (and somewhat more ambitiously), we may choose to study PAC learnability of ontologies based on specific description logics (see Exact Learning of Lightweight Description Logic Ontologies for results concerning exact rather than PAC learning of such ontologies).

Collaborative Conceptual Exploration

Conceptual exploration is a family of knowledge-acquisition techniques based on formal concept analysis (Ganter and Obiedkov 2016). The goal is to build a complete (with respect to a fixed language) implicational theory of a domain by posing queries to the domain expert. When properly implemented, it is a great tool that can help organise the process of scientific discovery. The existing conceptual exploration methods assume a single oracle with a thorough knowledge of the domain. We aim to extend this model to a more practically relevant setting when, in place of a single omniscient and unerring oracle, we have multiple experts who have incomplete or even contradictory knowledge of the domain. Such extensions are known for other active learning models (Donmez and Carbonell 2008, Yan et al. 2011, Chakraborty 2020) but will be novel in conceptual exploration. Adapting exploration procedures to work with several imperfect experts requires modifying the mathematical model, developing new efficient algorithms, defining a strategy for experts’ interaction allowing for conflict resolution between contradictory experts’ opinions and techniques to combine the results of independent work of several expert groups, providing the ability to withdraw previous decisions based on new information (support for nonmonotonic reasoning). As for the joint work of experts, learning methods to be developed fall into two groups. If for a chosen description language, there exists a unique correct domain description (up to semantically equivalent syntactic transformations), there is a need for such methods that, under certain (e.g., probabilistic) assumptions on the completeness and correctness of experts’ knowledge, will return an approximate (in a certain formal sense) domain description. If there is no unique correct description and ontology construction is aimed at establishing consensus between various expert representations of the domain (which may be important, for example, for the humanities), there is a need for methods that can identify the knowledge shared by all experts and highlight positions on which experts disagree. During the internship, we will fix a precise problem statement of learning an implicational theory from queries to multiple experts and will try to adapt the existing learning algorithms to this setting.

Error Detection and Correction in Russian Texts Written by Non-native Speakers

General-purpose spellcheckers are usually designed so as to handle errors made by native speakers. Errors made by non-native speakers, e.g., language learners, are quite different, and they present a serious challenge for automated detection and correction. During the internship, you will have to study various approaches addressing this challenge, such as those described in the paper Grammar Error Correction in Morphologically-Rich Languages: The Case of Russian, and try to reproduce and improve results presented in the literature.

 

О НИУ ВШЭ

НИУ ВШЭ входит в число лучших университетов России и является одним из ведущих университетов в Восточной Европе и Евразии. Быстро превратившись в известный исследовательский университет, НИУ ВШЭ выделяется своим международным присутствием.

В марте 2014 года НИУ ВШЭ совместно с Яндексом открыла факультет компьютерных наук. Целью факультета является подготовка высококвалифицированных специалистов в области анализа данных, разработчиков программного обеспечения и исследователей в области информатики.

 

Часто задаваемые вопросы

Кто может участвовать в стажировке?

Мы приглашаем студентов бакалаврских, магистерских и аспирантских программ. Важную роль играет опыт в интересующем вас направлении стажировки.

Нужно ли предоставлять официальный сертификат по английскому языку?

Нет, не нужно предоставлять такой сертификат. Но интервью с потенциальным научным руководителем будет проходить на английском языке.

Стажировки являются платными или бесплатными?

Все стажировки являются бесплатными.

Есть ли стипендии или финансовая поддержка?

Да, мы покроем перелет и проживание.

Отзывы

В стажировках ФКН НИУ ВШЭ принимают участие студенты Оксфордского университета (Великобритания), Высшей нормальной школы Парижа (Франция), Падуанского университета (Италия), Тулузского университета (Франция), Высшего технического института Лиссабона (Португалия), Центральной высшей школы Марселя (Франция), INSA Lyon (Франция).