Коллоквиум

Для исследователя в такой разнообразной и быстро развивающейся области знаний, как компьютерные науки, важно сохранять широту кругозора и стремиться понимать, чем занимаются коллеги в смежных областях. Для этого нужна площадка, на которой специалисты встречаются и рассказывают друг другу о последних результатах понятным языком. Такой площадкой и является Коллоквиум факультета компьютерных наук ВШЭ — общефакультетский научный семинар, предназначенный для преподавателей и научных сотрудников факультета, аспирантов, магистрантов и студентов бакалавриата, а также для всех интересующихся компьютерными науками.

Заседания коллоквиума проходят, как правило, раз в две недели по вторникам в здании факультета компьютерных наук по адресу Кочновский проезд, дом 3, в аудитории 205.

Записи мероприятий доступны на канале факультета на YouTube

2017 – 2018

29 мая, 18:10 – 19:30
Кускова Валентина (НИУ ВШЭ)

In search of missing network data: applications to real-life problems

Over the years, our ability to impute missing data, including network nodes and edges, has improved significantly. Multiple algorithms of data imputation show excellent results in computer simulations, but when it comes to using them on real-life data, applied to real-life problems, several issues arise. First, no matter how advanced, simulation models still lack the complexity of human relationships, which is often at the forefront of applied social network analysis. Second, different algorithms produce confusing, often conflicting results on real-life data, with no good way to offer good theoretical interpretations. Finally, most algorithms are still rather complex to be implemented by social scientists, who do not have the same programming skills or access to software as computer scientists do.

In this talk, we will look at some real-life problems that arise with missing network data. We will report results on a study that combines real-life data with simulation techniques to address the impact that missing nodes and edges have on the interpretation of obtained results. We will also discuss limitations of currently available data imputation techniques in application to social sciences and outline key directions for future research.

17 апреля, 18:10 – 19:30, ауд. 205
Guilhem Gamard (HSE)

The Meltdown Attack

Modern CPU hardware implement a memory-protection mechanism to prevent one process from reading memory of another process. A few months ago, several vulnerabilities in this mechanism were published; this talk explains one of them, called Meltdown. This attack allows one process to read the whole memory of the machine on which it currently runs. This mostly concerns cloud-computing providers, as virtual machines running on the same physical server can spy each other.

Meltdown received vast coverage because it impacts virtually any Intel CPU currently on the market, and because it has existed for about 20 years before it was discovered. Operating systems vendors have implemented, in software, techniques to mitigate Meltdown; they claim that those security patches induce performance loss in running applications.

In this talk, we will review some features of modern CPUs, then we will explain how to exploit them to bypass memory protection. Finally, we will see how operating systems were modified to mitigate this risk.

Афиша

22 февраля, 18:10 – 19:30
Кочновский проезд, 3, ауд. 317

Eric Moulines (École Polytechnique, Франция)

Perturbed Proximal Gradient Algorithms

We study a version of the proximal gradient algorithm for which the gradient is intractable and is approximated by Monte Carlo methods (and in particular Markov Chain Monte Carlo). We derive conditions on the step size and the Monte Carlo batch size under which convergence is guaranteed: both increasing batch size and constant batch size are considered. We also derive non-asymptotic bounds for an averaged version. Our results cover both the cases of biased and unbiased Monte Carlo approximation. To support our findings, we discuss the inference of a sparse generalized linear model with random effect and the problem of learning the edge structure and parameters of sparse undirected graphical models.

Афиша

Для участия в мероприятии и для заказа пропуска в здание необходимо зарегистрироваться.

23 января 2018, 18:10 – 19:30

Кочновский проезд, 3, ауд. 205

Quentin Paris, НИУ ВШЭ

On empirical risk minimization and its variants for statistical learning

In this talk, we review fundamental principles of empirical risk minimization and its performance guarantees for statistical learning. We discuss the close interaction with the field of empirical processes and the connection to Vapnik–Chervonenkis combinatorics (including the notion of combinatorial dimension). We present the best known theoretical guarantees for the prediction error of empirical risk minimizers, discuss the limitations of the method, and mention some recent contributions.

Афиша коллоквиума

26 декабря 2017, 18:30 – 19:50

Кочновский проезд, 3, ауд. 205

Ben Livshits, Imperial College London

Just-in-Time Static Analysis

We present the concept of Just-In-Time (JIT) static analysis that interleaves code development and bug fixing in an integrated development environment. Unlike traditional static analysis tools, a JIT analysis tool presents warnings to code developers over time, providing the most relevant results quickly, and computing less relevant results incrementally later. This talk outlines general guidelines for designing JIT analyses.

We also present a general recipe for turning static data-flow analyses into JIT analyses through a concept of layered analysis execution illustrated through Cheetah, a JIT taint analysis for Android applications.

Our evaluation of Cheetah on real-world applications and our user study show that JIT analyses are able to present those warnings that are of importance to the code developers just-in-time, allowing them to start fixing problems immediately, without losing their context. Furthermore, study participants consistently reported higher satisfaction levels with our JIT analysis, compared to its traditional counterpart.

Афиша коллоквиума

19 декабря 2017, 18:10 – 19:30

Кочновский проезд, 3, ауд. 205

Boris Gutman, Imaging Genetics Center, University of Southern California

Brain Imaging and Genetics: Computational Methods and Applications

Neuroimaging and full genome sequencing in large samples promise to reveal subtle genetic and disease effects in the brain. Computational problems in imaging genetics arise from persistent themes in brain image analysis: high dimensionality and heterogeneity of the data, complex spatial processes, and great variety of imaging modalities. A variety of mathematical and statistical learning tools have been used in imaging over the years to solve these problems. In this talk, I will present some of the proposed solutions, including applications from continuum mechanics, differential geometry and shape spaces, and network analysis.

Афиша коллоквиума

29 ноября 2017, 16:40 – 18:00

Кочновский проезд, 3, ауд. 205

Vladimir Kolmogorov, IST Austria

Complexity Classifications of Valued Constraint Satisfaction Problems

Classifying complexity of different classes of optimization problems is an important research direction in Theoretical Computer Science. One prominent framework is Valued Constraint Satisfaction Problems (VCSPs) in which the class is parameterized by a "language", i.e. a set of cost functions over a fixed discrete domain. An instance of the problem is an arbitrary sum of functions from the language (possibly with overlapping variables), with the goal to minimize the sum. This framework can capture many classes of optimization problems such as 3-SAT, graph coloring, minimum vertex cover, submodular minimization, and others.

A series of recent papers, culminating with the works of D. Zhuk and A. Bulatov in 2017, has established a complete complexity classification of arbitrary languages. I will describe our contributions to this topic. One of the results is a reduction from general VCSPs to plain CSPs (i.e. to languages with {0, infinity}-valued functions). The key algorithmic tool that we use is a certain LP relaxation of the problem combined with the algorithm for plain CSPs.

In the second part of the talk, I will consider a version of the CSP framework where each variable must appear in exactly two constraints. It captures, in particular, the class of perfect matching problems, which can be solved by the Edmonds's blossom-shrinking algorithm. I will present a generalization of this algorithm that can handle "even Delta-matroid constraints". As a consequence of this, we settle the complexity classification of planar Boolean CSPs started by Dvořák and Kupec.

Афиша коллоквиума

28 ноября 2017, 18:10 – 19:30

Кочновский проезд, 3, ауд. 205

Andrea Burattin, DTU, Denmark

Online Process Mining

Process mining analyses data referring to executions of (business) processes. Typical approaches analyse historical execution data post-hoc, while online process mining aims at inferring information for running executions based "live data". Online mining carries scientific and computational challenges, since the processing has to take place immediately when events are emitted, possibly at very high speed. The benefits, however, are very relevant: online mining provides immediate knowledge on what is happening, thus detecting very early if the system is properly behaving and how its users interact with it. In this talk we focus on the online discovery of processes and how they evolve over time. Secondly, we present online conformance checking: detecting if – and to what extent – running executions are deviating from the reference behaviour they are supposed to follow.

Афиша коллоквиума

21 ноября 2017, 18:10 – 19:30

Кочновский проезд, 3, ауд. 205

Антон Осокин, НИУ ВШЭ

Как создавать нейросети на основе классических вычислительных алгоритмов?

За последние несколько лет технологии глубинного обучения позволили получить выдающиеся практические результаты в таких прикладных областях как компьютерное зрение и обработка естественного языка. Для создания моделей для практических задач чаще всего используют блоки (слои) из небольшого списка стандартных операций (полно-связные, свёрточные, рекуррентные слои). Ограниченность такого набора является одним из препятствий для переноса технологий на новые задачи. С другой стороны, для многих задач уже накоплено большое количество алгоритмов и практик, позволяющих получать хорошие результаты. Возможно ли строить глубинные модели не с чистого листа, а на основе уже существующих не-нейросетевых решений? В рамках этого доклада мы рассмотрим несколько способов построения нейросетей (или слоев нейросетей) на основе существующих алгоритмов из компьютерных наук. Будут затронуты прямое разворачивание алгоритмов в слои нейросетей, использование комбинаторной оптимизации для выбора активаций сети, дифференцирование результатов алгоритмов по входам. Мы посмотрим, как применять эти подходы на примере задач предсказания со структурированным выходом (structured-output prediction) и на их применения в задачах компьютерного зрения.

Афиша коллоквиума

24 октября 2017, 18:10 – 19:30

Кочновский проезд, 3, ауд. 205

Мария Попцова, НИУ ВШЭ

Методы машинного обучения и большие данные биоинформатики

Проект расшифровки первого генома человека занял 13 лет, потребовал около 1,5 миллиарда долларов и работы огромного числа институтов и университетов мира. Революция в технологиях секвенирования, произошедшая в начале 21 века, позволила сократить затраты до 2 дней и 1000 долларов. Технологии секвенирования следующего поколения (Next Generation Sequencing, или NGS) производят данные геномики, эпигеномики, транскриптомики, протеомики, метаболомики и других “омик” молекулярной биологии. Как только стало возможно секвенировать буквально “все”, были запущены международные консорциумные проекты, такие как проект 1000 геномов человека, Hap Map – исследование разнообразия человека на 450 геномов трех рас, ENCODE – энциклопедия ДНК-элементов, The Roadmap Epigenomics (маркировка эпигенетических факторов, формирующих ткани) и проекты по секвенированию всех типов раковых опухолей (The Cancer Genome Atlas, TCGA и International Cancer Genome Consortium, ICGC). Биоинформатика на наших глазах стала областью, быстро генерирующей большие данные, нуждающиеся в обработке и интерпретации. В лекции я расскажу о том, что это за данные с точки зрения аналитика данных и как методы машинного обучения успешно примененяются для решения задач аннотации и поиска новых связей между функциональными элементами генома. Предварительное знание биоинформатики не предполагается.

Афиша коллоквиума

5 октября 2017, 18:10 – 19:30

Кочновский проезд, 3, ауд. 205

Константин Воронцов, Яндекс/МФТИ/НИУ ВШЭ

Многокритериальный тематический анализ текстовых коллекций

Тематическое моделирование — это область статистического анализа текстов, активно развивающаяся последние 15 лет. Тематическая модель коллекции текстовых документов определяет, к каким темам относится каждый документ и какие слова образуют каждую тему. Построение тематической модели сводится к решению некорректно поставленной задачи матричного разложения. Для доопределения решения используются дополнительные критерии-регуляризаторы. Тематическое моделирование отличается огромным разнообразием регуляризаторов, с помощью которых можно строить тематические иерархии, учитывать лингвистические ограничения, вовлекать нетекстовые данные о времени, авторах, пользователях, ссылках, взаимосвязях. В лекции будет рассказано, как все эти ограничения формализуются на языке регуляризации, как их можно комбинировать друг с другом для построения моделей с заданными свойствами и как теория аддитивной регуляризации приводит к модульной технологии тематического моделирования.

Афиша коллоквиума

11 сентября 2017, 16:40 – 18:10

Кочновский проезд, 3, ауд. 317

Егор Костылев, University of Oxford

Расширение well-designed SPARQL

SPARQL — это стандартный язык запросов для RDF-данных. Одной из важнейших особенностей этого языка относительно многих других является оператор OPTIONAL, допускающий частичные ответы в тех случаях, когда полные ответы недоступны из-за недостатка данных. Однако запросы с OPTIONAL имеют высокую сложность — соответствующая проблема является PSPACE-полной. Фрагмент языка, допускающий только ограниченное использование OPTIONAL (так называемый well-designed SPARQL), имеет более низкую сложность — coNP. Однако, как показывают наши исследования, только чуть больше половины реальных запросов с OPTIONAL к DBpedia попадают в этот фрагмент. Мы предлагаем новый класс запросов, расширяющий well-designed SPARQL, который, во-первых, включает 99% запросов с OPTIONAL и, во-вторых, также имеет coNP-сложность вычисления. Мы также исследуем другие свойства нового фрагмента, такие как сложность проверки эквивалентности и вычислительная мощность.

Афиша коллоквиума

11 сентября 2017, 18:10 – 19:30

Кочновский проезд, 3, ауд. 317

Wray Buntine, Monash University

Learning on networks of distributions for discrete data

I will motivate the talk by reviewing some state of the art models for problems like matrix factorisation models for link prediction and tweet clustering. Then I will review the classes of distributions that can be strung together in networks to generate discrete data. This allows a rich class of models that, in its simplest form, covers things like Poisson matrix factorisation, Latent Dirichlet allocation, and Stochastic block models, but, more generally, covers complex hierarchical models on network and text data. The distributions covered include so-called non-parametric distributions such as the Gamma process. Accompanying these are a set of collapsing and augmentation techniques that are used to generate fast Gibbs samplers for many models in this class. To complete this picture, turning complex network models into fast Gibbs samplers, I will illustrate our recent methods of doing matrix factorisation with side information (e.g., GloVe word embeddings), done for link prediction, for instance, for citation networks.

Афиша коллоквиума

Коллоквиум

2017 – 2018

29 мая, 18:10 – 19:30
Кускова Валентина (НИУ ВШЭ)