# Publications

Recent research has reported that standard fine-tuning approaches can be unstable due to being prone to various sources of randomness, including but not limited to weight initialization, training data order, and hardware. Such brittleness can lead to different evaluation results, prediction confidences, and generalization inconsistency of the same models independently fine-tuned under the same experimental setup. Our paper explores this problem in natural language inference, a common task in benchmarking practices, and extends the ongoing research to the multilingual setting. We propose six novel textual entailment and broad-coverage diagnostic datasets for French, German, and Swedish. Our key findings are that the mBERT model demonstrates fine-tuning instability for categories that involve lexical semantics, logic, and predicate-argument structure and struggles to learn monotonicity, negation, numeracy, and symmetry. We also observe that using extra training data only in English can enhance the generalization performance and fine-tuning stability, which we attribute to the cross-lingual transfer capabilities. However, the ratio of particular features in the additional training data might rather hurt the performance for model instances. We are publicly releasing the datasets, hoping to foster the diagnostic investigation of LMs in a cross-lingual scenario, particularly in terms of benchmarking, which might promote a more holistic understanding of multilingualism in LMs and cross-lingual knowledge transfer.

Our concern is the problem of efficiently determining the data complexity of answering queries mediated by description logic ontologies and constructing their optimal rewritings to standard database queries. Originated in ontology-based data access and datalog optimisation, this problem is known to be computationally very complex in general, with no explicit syntactic characterisations available. In this article, aiming to understand the fundamental roots of this difficulty, we strip the problem to the bare bones and focus on Boolean conjunctive queries mediated by a simple covering axiom stating that one class is covered by the union of two other classes. We show that, on the one hand, these rudimentary ontology-mediated queries, called disjunctive sirups (or d-sirups), capture many features and difficulties of the general case. For example, answering d-sirups is Π2p-complete for combined complexity and can be in Image 1 or L-, NL-, P-, or coNP-complete for data complexity (with the problem of recognising FO-rewritability of d-sirups being 2ExpTime-hard); some d-sirups only have exponential-size resolution proofs, some only double-exponential-size positive existential FO-rewritings and single-exponential-size nonrecursive datalog rewritings. On the other hand, we prove a few partial sufficient and necessary conditions of FO- and (symmetric/linear-) datalog rewritability of d-sirups. Our main technical result is a complete and transparent syntactic Image 1/NL/P/coNP tetrachotomy of d-sirups with disjoint covering classes and a path-shaped Boolean conjunctive query. To obtain this tetrachotomy, we develop new techniques for establishing P- and coNP-hardness of answering non-Horn ontology-mediated queries as well as showing that they can be answered in NL.

We derive rank bounds on the quantized tensor train (QTT) compressed approximation of singularly perturbed reaction diffusion boundary value problems in one dimension. Specifically, we show that, independently of the scale of the singular perturbation parameter, a numerical solution with accuracy 0 < 𝜀 < 1 can be represented in the QTT format with a number of parameters that depends only polylogarithmically on 𝜀. In other words, QTT-compressed solutions converge exponentially fast to the exact solution, with respect to a root of the number of parameters. We also verify the rank bound estimates numerically and overcome known stability issues of the QTT-based solution of partial differential equations (PDEs) by adapting a preconditioning strategy to obtain stable schemes at all scales. We find, therefore, that the QTT-based strategy is a rapidly converging algorithm for the solution of singularly perturbed PDEs, which does not require prior knowledge on the scale of the singular perturbation and on the shape of the boundary layers.

A method of arm aiming direction estimation for low performance Internet of Things devices is proposed. It uses Human Pose Estimation (HPE) algorithms for retrieving human skeleton key points. Having these key points, arm aiming directions model is calculated. Two well-known HPE methods (PoseNet and OpenPose) are examined. These algorithms have been tested and compared by the average angle of error. The system includes a Raspberry Pi 4B single- board computer and an Intel RealSense D435i depth sensor. The developed approach may be utilized in “smart home” gesture control systems.

An extension of an induced path 𝑃P in a graph 𝐺G is an induced path 𝑃′P′ such that deleting the endpoints of 𝑃′P′ results in 𝑃P. An induced path in a graph is said to be avoidable if each of its extensions is contained in an induced cycle. In 2019, Beisegel, Chudovsky, Gurvich, Milanič, and Servatius conjectured that every graph that contains an induced 𝑘k-vertex path also contains an avoidable induced path of the same length, and proved the result for 𝑘=2k=2. The case 𝑘=1k=1 was known much earlier, due to a work of Ohtsuki, Cheung, and Fujisawa in 1976. The conjecture was proved for all 𝑘k in 2020 by Bonamy, Defrain, Hatzel, and Thiebaut. In the present paper, using a similar approach, we strengthen their result from a reconfiguration point of view. Namely, we show that in every graph, each induced path can be transformed to an avoidable one by a sequence of shifts, where two induced 𝑘k-vertex paths are shifts of each other if their union is an induced path with 𝑘+1k+1 vertices. We also obtain analogous results for not necessarily induced paths and for walks. In contrast, the statement cannot be extended to trails or to isometric paths.

Style transfer is an important and a rapidly developing of Natural Language Processing. This days more and more methods and models are proposed which allow us to generate text in predefined style. In this paper we propose a framework for style transfer of “Friends” TV series. The trained models are able to mimic one of 6 main characters of this famous TV-series in English and Russian. We also present a dialogue dataset of “Friends” subtitles in English and its Russian automatic translation. In addition to that we perform a multilingual comparison of “Friends” style transfer in the two considered languages.

A *d*-limited automaton is a nondeterministic Turing machine that uses only the cells with the input word (and end-markers) and rewrites symbols only in the first *d* visits. This model was introduced by T. Hibbard in 1967 and he showed that *d*-limited automata recognize context-free languages for each 𝑑⩾2d⩾2. He also proved that languages recognizable by deterministic *d*-limited automata form a hierarchy and it was shown later by Pighizzini and Pisoni that it begins with deterministic context-free languages (DCFLs) (for 𝑑=2d=2).

As well-known, DCFLs are widely used in practice, especially in compilers since they are linear-time recognizable and have the corresponding CF-grammars subclass (LR(1)LR(1)-grammars). In this paper we present a linear time recognition algorithm for deterministic *d*-limited automata (in the RAM model) which opens an opportunity for their possible practical applications.

In the modern world, there is a huge increase in the usage of various technologies which may produce heterogeneous data. Using the latest security algorithms like Triple Data Encryption Standard (TDES), Rivest Shamir Adleman (RSA), Advanced Encryption Standard (AES) these (heterogeneous) data are transferred over the internet. Despite of such good security mechanisms, there exist vulnerabilities to the user data. In the recent years, researchers started the usage of bio-inspired computing models towards security. In this paper, two such bio-inspired computing models namely splicing system and Deoxyribonucleic acid (DNA) sequencing were used. We proposed a novel cryptosystem using contextual array splicing system for DNA sequenced input data. In this process, we have designed Turing machine(s) that perform(s) the conversion of a binary input to DNA sequence and the cryptosystem operation using contextual array splicing system. Next, the developed novel cryptosystem is applied in the domain of Medical Internet of Things (MIoT). Further, we had investigated the proof of correctness. Next, the proposed cryptosystems are proved to be Turing computable.

In scientific computing and machine learning applications, matrices and more general multidimensional arrays (tensors) can often be approximated with the help of low-rank decompositions. Since matrices and tensors of fixed rank form smooth Riemannian manifolds, one of the popular tools for finding low-rank approximations is to use Riemannian optimization. Nevertheless, efficient implementation of Riemannian gradients and Hessians, required in Riemannian optimization algorithms, can be a nontrivial task in practice. Moreover, in some cases, analytic formulas are not even available. In this paper, we build upon automatic differentiation and propose a method that, given an implementation of the function to be minimized, efficiently computes Riemannian gradients and matrix-by-vector products between an approximate Riemannian Hessian and a given vector.

We propose an end-to-end trainable framework that processes large-scale visual data tensors by looking at a fraction of their entries only. Our method combines a neural network encoder with a tensor train decomposition to learn a low-rank latent encoding, coupled with cross-approximation (CA) to learn the representation through a subset of the original samples. CA is an adaptive sampling algorithm that is native to tensor decompositions and avoids working with the full high-resolution data explicitly. Instead, it actively selects local representative samples that we fetch out-of-core and on demand. The required number of samples grows only logarithmically with the size of the input. Our implicit representation of the tensor in the network enables processing large grids that could not be otherwise tractable in their uncompressed form. The proposed approach is particularly useful for large-scale multidimensional grid data (e.g., 3D tomography), and for tasks that require context over a large receptive field (e.g., predicting the medical condition of entire organs). The code is available at https://github.com/aelphy/c-pic.

We consider a generalization of the Pólya–Kasteleyn approach to counting the number of perfect matchings in a graph based on computing the symbolic Pfaffian of a directed adjacency matrix of the graph. Complexity of algorithms based on this approach is related to the complexity of the sign function of a perfect matching in generalized decision tree models. We obtain lower bounds on the complexity of the sign of a perfect matching in terms of the deterministic communication complexity of the XOR sign function of a matching. These bounds demonstrate limitations of the symbolic Pfaffian method for both the general case and the case of sparse graphs.

We show that deciding boundedness (aka FO-rewritability) of monadic single rule datalog programs (sirups) is 2\Exp-hard, which matches the upper bound known since 1988 and finally settles a long-standing open problem. We obtain this result as a byproduct of an attempt to classify monadic 'disjunctive sirups'---Boolean conjunctive queries $\q$ with unary and binary predicates mediated by a disjunctive rule $T(x) łor F(x) łeftarrow A(x)$---according to the data complexity of their evaluation. Apart from establishing that deciding FO-rewritability of disjunctive sirups with a dag-shaped $\q$ is also 2\Exp-hard, we make substantial progress towards obtaining a complete FO/Ł-hardness dichotomy of disjunctive sirups with ditree-shaped $\q$.

We study the problem of whether a given finite algebra with finitely many basic operations contains a cube term; we give both structural and algorithmic results. We show that if such an algebra has a cube term then it has a cube term of dimension at most *N*, where the number *N* depends on the arities of basic operations of the algebra and the size of the basic set. For finite idempotent algebras we give a tight bound on *N* that, in the special case of algebras with more than (|A|2)(|A|2) basic operations, improves an earlier result of K. Kearnes and Á. Szendrei. On the algorithmic side, we show that deciding the existence of cube terms is in P for idempotent algebras and in EXPTIME in general. Since an algebra contains a *k*-ary near unanimity operation if and only if it contains a *k*-dimensional cube term and generates a congruence distributive variety, our algorithm also lets us decide whether a given finite algebra has a near unanimity operation.

We study the two party problem of randomly selecting a common string among all the strings of length *n*. We want the protocol to have the property that the output distribution has high *Shannon entropy* or high *min entropy*, even when one of the two parties is dishonest and deviates from the protocol. We develop protocols that achieve high, close to *n*, Shannon entropy and simultaneously min entropy close to *n*/2. In the literature the randomness guarantee is usually expressed in terms of “resilience”. The notion of Shannon entropy is not directly comparable to that of resilience, but we establish a connection between the two that allows us to compare our protocols with the existing ones. We construct an explicit protocol that yields Shannon entropy *𝑛*−*𝑂*(1) and has *𝑂*(log∗*𝑛*) rounds, improving over the protocol of Goldreich et al. (SIAM J Comput 27: 506–544, 1998) that also achieves this entropy but needs *O*(*n*) rounds. Both these protocols need *𝑂*(*𝑛*2) bits of communication. Next we reduce the number of rounds and the length of communication in our protocols. We show the existence, non-explicitly, of a protocol that has 6 rounds, *O*(*n*) bits of communication and yields Shannon entropy *𝑛*−*𝑂*(log*𝑛*) and min entropy *𝑛*/2−*𝑂*(log*𝑛*). Our protocol achieves the same Shannon entropy bound as, also non-explicit, protocol of Gradwohl et al. (in: Dwork (ed) Advances in Cryptology—CRYPTO ‘06, 409–426, Technical Report , 2006), however achieves much higher min entropy: *𝑛*/2−*𝑂*(log*𝑛*) versus *𝑂*(log*𝑛*). Finally we exhibit a very simple 3-round explicit “geometric” protocol with communication length *O*(*n*). We connect the security parameter of this protocol with the well studied Kakeya problem motivated by Harmonic Analysis and Analytic Number Theory. We prove that this protocol has Shannon entropy *𝑛*−*𝑜*(*𝑛*)

. Its relation to the Kakeya problem follows a new and different approach to the random selection problem than any of the previously known protocols.

We show that the one-way ANOVA and Tukey–Kramer (TK) tests agree on any sample with two groups. This result is based on a simple identity connecting the Fisher–Snedecor and studentized probabilistic distributions and is proven without any additional assumptions; in particular, the standard ANOVA assumptions (independence, normality, and homoscedasticity (INAH)) are not needed. In contrast, it is known that for a sample with k>2 groups of observations, even under the INAH assumptions, with the same significance level α, the above two tests may give opposite results: (i) ANOVA rejects its null hypothesis HA0:μ1=…=μk, while the TK one, HTK0(i,j):μi=μj, is not rejected for any pair i,j∈{1,…,k}; (ii) the TK test rejects HTK0(i,j) for a pair (i,j) (with i≠j), while ANOVA does not reject HA0. We construct two large infinite pseudo-random families of samples of both types satisfying INAH: in case (i) for any k≥3 and in case (ii) for some larger k. Furthermore, case (ii) ANOVA, being restricted to the pair of groups (i,j), may reject equality μi=μj with the same α. This is an obvious contradiction, since μ1=…=μk implies μi=μj for all i,j∈{1,…,k}. Such contradictions appear already in the symmetric case for k=3, or in other words, for three groups of d,d, and c observations with sample means +1,−1, and 0, respectively. We outline conditions necessary and sufficient for this phenomenon. Similar contradictory examples are constructed for the multivariable linear regression (MLR). However, for these constructions, it seems difficult to verify the Gauss–Markov assumptions, which are standardly required for MLR.

Given n piles of tokens and a positive integer k \leq n, the game Nim^1_{n,=k} of exact slow k-Nim is played as follows. Two players move alternately. In each move, a player chooses exactly k non-empty piles and removes one token from each of them. A player whose turn it is to move but has no move loses (if the normal version of the game is played, and wins if it is the misere version). In Integers 20 (2020), #G3, Gurvich et al. gave an explicit formula for the Sprague-Grundy function of Nim^1_{4,=2}, for both its normal and misere versions. Here we extend this result and obtain an explicit formula for the P-positions of the normal version of Nim^1_{5;=2} and Nim^1_{6,=2}.

We consider a computational model which is known as set automata. The set automata are one-way finite automata with additional storage-the set. There are two kinds of set automata-deterministic (DSA's) and nondeterministic (NSA's). The model was introduced by Kutrib, Malcher, Wendlandt in 2014. It was shown that DSA-recognizable languages look similar to DCFL's and NSA-recognizable languages look similar to CFL's.

In this paper, we extend this similarity and prove that languages recognizable by NSA's form a rational cone, as do CFL's.

The main topic of this paper is computational complexity: we prove that

languages recognizable by DSA's belong to P and there are P-complete languages among them;

languages recognizable by NSA's are in NP and there are NP-complete languages among them;

the word membership problem is P-complete for DSA's without epsilon-loops and PSPACE-complete for general DSA's;

the emptiness problem is PSPACE-complete for DSA's.

In this paper we study the separation between two complexity measures: the degree of a Boolean function as a polynomial over the reals and the block sensitivity. We show that the upper bound on the largest possible separation between these two measures can be improved from d2(f)≥bs(f)d2(f)≥bs(f), established by Tal [19], to d2(f)≥(10−−√−2)bs(f)d2(f)≥(10−2)bs(f). As a corollary, we show that the similar upper bounds between some other complexity measures are not tight as well, for instance, we can improve the recent sensitivity conjecture result by Huang [10] s4(f)≥bs(f)s4(f)≥bs(f) to s4(f)≥(10−−√−2)bs(f)s4(f)≥(10−2)bs(f). Our techniques are based on the paper by Nisan and Szegedy [14] and include more detailed analysis of a symmetrization polynomial.

In our next result we show the same type of improvement for the separation between the approximate degree of a Boolean function and the block sensitivity: we show that deg21/3(f)≥6/101−−−−−√bs(f)deg1/32(f)≥6/101bs(f) and improve the previous result by Nisan and Szegedy [14] deg1/3(f)≥bs(f)/6−−−−−−√deg1/3(f)≥bs(f)/6. In addition, we construct an example showing that the gap between the constants in the lower bound and in the known upper bound is less than 0.2.

In our last result we study the properties of a conjectured fully sensitive function on 10 variables of degree 4, existence of which would lead to improvement of the biggest known gap between these two measures. We prove that there is the only univariate polynomial that can be achieved by symmetrization of such a function by using the combination of interpolation and linear programming techniques.

The paper deals with the max-cut problem for random hypergraphs. We consider a binomial model of a random *k*-uniform hypergraph *H*(*n*, *k*, *p*) for some fixed k≥3k≥3, growing *n* and p=p(n)p=p(n). For given natural number *q*, the max-*q*-cut for a hypergraph is the maximal possible number of edges that can be properly colored with *q* colors under the same coloring. Generalizing the known results for graphs we show that in the sparse case (when p=cn/(nk)p=cn/(nk) for some fixed c>0c>0 not depending on *n*) there exists a limit constant γ(c,k,q)γ(c,k,q) such that

max-q-cut(H(n,k,p))n⟶Pγ(c,k,q)max-q-cut(H(n,k,p))n⟶Pγ(c,k,q)

as n→+∞n→+∞. We also prove some estimates for γ(c,k,q)γ(c,k,q) of the form Ak,q⋅c+Bk,q⋅c√+o(c√)Ak,q⋅c+Bk,q⋅c+o(c).