# Problems in Theoretical Computer Science

This workshop is organized by the Laboratory of Theoretical Computer Science of the Higher School of Economics. The scope of the workshop covers various areas of Theoretical Computer Science. Two main scopes of this year's meeting are Computational Complexity and Theory of Algorithms.**Registration**

To attend the conference please fill-in a short registration form. If you do not have a pass to HSE buildings please bring an ID.

Workshop photos

### Workshop Program

**Alexander Shen (Montpellier University)**

It is well known that compression with finite automata is closely related to normal sequences (all substrings of given length have the same frequency). Still the most straightforward approach using automatic description modes was not used -- we will show how it could be done to provide simple proofs of classical results about normal sequences.

**Konstantin Makarychev (Microsoft Research)**

I will talk about the correlation clustering problem. In this problem, our goal is to partition a collection of items into groups of similar items. We assume that we are given a graph on the set of all items, in which every edge (x,y) is marked with a sign “+” or “-” by a noisy classifier. The “+” sign indicates that the objects x and y are similar (and thus should belong to the same cluster); the “-” sign indicates that the objects are dissimilar (and should belong to different clusters). The classifier makes errors and, therefore, the information we get is inconsistent with any clustering. We want to find a clustering that minimizes the number of misclassified edges i.e. the number of “+” edges crossing the boundary of the partition plus the number of “-” edges lying within one partition.

Correlation clustering has been extensively studied in theoretical computer science and machine learning communities. It is used for clustering data sets containing complex objects which cannot be embedded into Euclidean space in a natural way and, therefore, cannot be clustered using k-means and other standard algorithms. The problem is NP hard. I will present a 2.06 approximation algorithm for correlation clustering on complete graphs. I will also describe a semi-random model for correlation clustering with partial information, and give a PTAS (almost exact algorithm) for semi-random instances.

**Based on joint works with** Shuchi Chawla, Tselil Schramm, and Grigory Yaroslavtsev; and with Yury Makarychev and Aravindan Vijayaraghavan

**Gleb Evstropov (HSE)**

Consider some weighted tree $T$ and a positive integer $c$. Every edge $e \in E(T)$ is assigned some positive integer length $l: e \in E(T) \rightarrow Z_+$. The problem is to pick exactly one pair of vertecies $u$ and $v$ and connect them by an edge of length $c$ in order to minimize the diameter of the resulting graph. That is $G = (V(T), E(T) + \{u, v\}), l(\{u, v\}) = c$ and $d(G)$ is minimum possible.

We define $n$ as the number of vertices in $T$ and $M$ as the smallest integer such that $c, l(e) \leqslant M$. The naive solution complexity vary from $O(n^5)$ to $O(n^3)$. In this paper we optimize the naive solution step by step and reduce complexity first to $O(n^2 \log M)$, then $O(n \log n \log M)$ and finally to $O(n \log M)$.

**Yury Makarychev (Toyota Technological Institute)**

Based on joint work with Konstantin Makarychev and Aravindan Vijayaraghavan, and with Haris Angelidakis and Konstantin Makarychev.

**Nikolay Karpov (PDMI)**

We give FPT-algorithms deciding if a graph G with a given cost function contains a secluded path and a secluded Steiner tree of exposure at most k with the cost at most C.

We initiate the study of "above guarantee" parameterizations for secluded problems, where the lower bound is given by the size of a Steiner tree.

We investigate Secluded Steiner Tree from kernelization perspective and provide several lower and upper bounds when parameters are the treewidth, the size of a vertex cover, maximum vertex degree and the solution size. Finally, we refine the algorithmic result of Chechik et al. by improving the exponential dependence from the treewidth of the input graph.

**Bruno Bauwens (HSE)**

An infinite bitsequence is computably random if no there exists no computable betting strategy against subsequent bits of the sequence for which the capital is unbounded. Let $\lambda$ be the (uniform) Lebesgue measure. This criterium is equivalent to the criterium that there is no computable measure P, such that the ratio P/\lambda is unbounded on the basic open sets containing the sequence. From a statistical point of view, this seems to be one of the easiest definition of randomness.

Van Lambalgen's requirement for a definition of randomness states that the following should be equivalent:

-The odd bits of a sequence are random.

-The even bits of a sequence are random relative to the odd bits.

In 2007, Liang Yu showed that this requirement fails, because the upward implication is false. The downward implication is open question. In Yu's result the definition of conditional randomness is perhaps a bit strange: for a specific X he considers conditional measures P(.|X_1X_3X_5...) that are computable only for the specific value of X_1X_3X_5... Perhaps it is more natural to use conditional measures that are computable as two-argument functions. In this case, it was an open question whether Van Lambalgen's criterium holds. We give a negative answer for both open questions.

**Dmitry Itsykson (PDMI)**

In 2004 Atserias, Kolaitis and Vardi proposed OBDD-based propositional proof systems that prove unsatisfiability of a CNF formula by deducing of identically false OBDD from OBDDs representing clauses of the initial formula. All OBDDs in such proofs have the same order of variables. We initiate the study of OBDD based proof systems that additionally contain a rule that allows to change the order in OBDDs. At first we consider a proof system OBDD(\land, reordering) that uses the conjunction (join) rule and the rule that allows to change the order. We exponentially separates this proof system from OBDD(\land)-proof system that uses only conjunction rule. We prove two exponential lower bounds on the size of OBDD(\land, reordering)-refutation of Tseitin formulas and the pigeonhole principle. The first lower bound was previously unknown even for OBDD(\land)-proofs and the second one extends the result of Tveretina et. al from OBDD(\land) to OBDD(\land, \reordering).

In 2004 Pan and Vardi proposed an approach to the propositional satisfiability problem based on OBDDs and symbolic quantifier elimination (we denote algorithms based on this approach as OBDD(\land, \exists)-algorithms). An instance of the propositional satisfiability problem is considered as existential quantified propositional formula. The algorithm chooses an order on variables and creates an ordered binary decision diagram (OBDD) D that initially represents the constant 1 function. Then the algorithm downloads to D clauses of the CNF one by one and applies to D the elimination of the existential quantifier for variable x if all clauses that contain x are already downloaded. We augment these algorithms with the operation of reordering of variables and call the new scheme OBDD(\land, \exists, \reordering)-algorithms. We notice that there exists an \OBDD(\land, \exists)-algorithm that solves satisfiable and unsatisfiable Tseitin formulas in polynomial time. In contrast, we show that there exist formulas representing systemsсof linear equations over GF(2) that are hard for OBDD(\land, \exists, reordering)-algorithms. Our hard instances are satisfiable formulas representing systems of linear equations over GF(2) that correspond to some checksum matrices of error correcting codes.

Joint work with Alexander Knop, Andrey Romashchenko and Dmitry Sokolov.

**Dmitry Sokolov (PDMI)**

In the classical communication complexity there is a natural correspondence between protocols and trees. In this talk we consider a notion of communication protocols that correspond to DAGs. This notion is a simplification of communication PLS games that are introduced by Razborov. Also we consider a DAG-like real valued communication protocols.

Consider a monotone boolean function $f: {0, 1}^n \to {0, 1}$. KW-relation is a communication problem, Alice receives $x \in f^{-1}(1)$, Bob receives $y \in f^{-1}(0)$ and their goal is to find a monotone bit of difference. There is a well known statement that communication complexity of KW relation equals to

monotone depth of formulas. We consider a relation between size of monotone boolean (real) circuits and DAG-like (real valued) communication protocols, and show a lower bounds on this protocols. As a corollary this lower bound give us a lower bound on monotone real circuit size for SAT-CSP problem.

We also consider a relation between DAG-like protocols and proof complexity and improve Krajicek's theorem that help us to give a lower bound on Random Cutting Planes proof system.

**Alexander Razborov (MIAN, Chicago University)**,

Tradeoff results in Complexity Theory capture our inherent inability to excel in two conflicting tasks at once. Thet normally have the following structure: any procedure that optimizes in one complexity resource must necessarily perform very badly in another, typically as bad as a "generic", "naive" or "brute-force" procedure for the whole class of problems under consideration.

In this talk we will survey a much stronger class of tradeoffs that have recently emerged in proof complexity in the work of the speaker and other researchers. The main difference from classical tradeoff results is that now the competing procedure (that is, propositional proof in our context) must be *exponentially worse* than the naive one. Altogether, this typically results in *doubly* exponential lower bounds in terms of the input size.

The talk will be practically self-contained; definitions from proof complexity will be reminded as necessary.

**Alexander Kozachinski (HSE), Egor Klenin (MSU)**

Assume that Alice has a binary string $x$ and Bob a binary string $y$, both of length $n$. Their goal is to output 0, if $x$ and $y$ are at least $L$-close in Hamming distance, and output 1, if $x$ and $y$ are at least $U$-far in Hamming distance, where $L < U$ are some integer parameters known to both parties. If the Hamming distance between $x$ and $y$ lies in the interval $(L, U)$, they are allowed to output anything. This problem is called the Gap Hamming Distance. We study public-coin one-sided error communication complexity of this problem. The error with probability at most 1/2 is allowed only for pairs at Hamming distance at least $U$. We establish the upper bound $O((L^2/U)\log L)$ and the lower bound $\Omega(L^2/U)$ for this complexity. These bounds differ only by a $O(\log L)$ factor.

**Edward Hirsch (PDMI)**

functions seems to be out of reach of the current techniques. There has been a (very slow) progress in proving linear lower bounds with the latest record of $3\frac1{86}n-o(n)$. All known lower bounds are based on the so-called gatel elimination technique. A typical gate elimination argument shows that it is possible to eliminate several gates from an optimal circuit by making one or several substitutions to the input variables and repeats this inductively.

In this note we prove that this method cannot achieve linear bounds of $cn$ beyond a certain constant $c$, where $c$ depends only on the number of substitutions made at a single step of the induction.

**Alexey Milovanov (HSE)**

We improve and simplify the result of the part 4 of ``Counting curves and their projections'' (Joachim von zur Gathen, Marek Karpinski, Igor Shparlinski) by showing that counting roots of a sparse polynomial over finite field is #P-complete under deterministic reductions.