This paper presents recent results of studies in application of sequence-based pattern structures and emerging patterns to analysis of demographic sequences in Russia. This study is performed on data of 11 generations from 1930 till 1984 for the panel of three waves of the Russian part of Generation and Gender Survey, which took place in 2004, 2007, and 2011. The main goal is to develop methods of extracting emerging patterns (EP) with the following restrictions: the obtained patterns need to be (closed) frequent contiguous prefixes of the input sequences. These constraints were required by demographers for proper interpretation and understanding of early life course events that lead to adulthood. To fulfil this requirement we used modified FP-trees based on pattern structures of contiguous prefixes. After extraction of EP we use CAEP(Classification by Aggregating Emerging Patterns) classifier to predict gender of respondents using their demographic sequences of the first life course events. The best results in terms of TPR-FPR have been obtained for large values of minimum growth-rate parameter (with some objects left without classification).
This book constitutes the refereed proceedings of the 28th International Conference on Data Analytics and Management in Data Intensive Domains, DAMDID/RCDL 2016, held in Ershovo, Moscow, Russia, in October 2016.
The 16 revised full papers presented together with one invited talk and two keynote papers were carefully reviewed and selected from 57 submissions. The papers are organized in topical sections on semantic modeling in data intensive domains; knowledge and learning management; text mining; data infrastructures in astrophysics; data analysis; research infrastructures; position paper.
Dualization of a monotone Boolean function on a finite lattice can be represented by transforming the set of its minimal 1 values to the set of its maximal 0 values. In this paper we consider finite lattices given by ordered sets of their meet and join irreducibles (i.e., as a concept lattice of a formal context). We show that in this case dualization is equivalent to the enumeration of so-called minimal hypotheses. In contrast to usual dualization setting, where a lattice is given by the ordered set of its elements, dualization in this case is shown to be impossible in output polynomial time unless P = NP. However, if the lattice is distributive, dualization is shown to be possible in subexponential time.
PRIMARY THERAPY OF EARLY BREAST CANCER
Evidence, Controversies, Consensus
15th St.Gallen International Breast Cancer Conference
Vienna, Austria, 15–18 March 2017
In this paper, we present novel winning team predicting models and compare the accuracy of the obtained prediction with TrueSkill model of ranking individual players impact based on their impact in team victory for the two most popular online games: Dota 2 and Counter-Strike: Global Offensive.
Modern co-authorship networks contain hidden patterns of researchers interaction and publishing activities. We aim to provide a system for selecting a collaborator for joint research or an expert on a given list of topics. We have improved a recommender system for finding possible collaborator with respect to research interests and predicting quality and quantity of the anticipated publications. Our system is based on a co-authorship network derived from the bibliographic database, as well as content information on research papers obtained from SJR Scimago, staff information and the other features from the open data of researchers profiles. We formulate the recommendation problem as a weighted link prediction within the co-authorship network and evaluate its prediction for strong and weak ties in collaborative communities.
With advances of recent technologies, augmented reality systems and autonomous vehicles gained a lot of interest from academics and industry. Both these areas rely on scene geometry understanding, which usually requires depth map estimation. However, in case of systems with limited computational resources, such as smartphones or autonomous robots, high resolution dense depth map estimation may be challenging. In this paper, we study the problem of semi-dense depth map interpolation along with low resolution depth map upsampling. We present an end-to-end learnable residual convolutional neural network architecture that achieves fast interpolation of semi-dense depth maps with different sparse depth distributions: uniform, sparse grid and along intensity image gradient. We also propose a loss function combining classical mean squared error with perceptual loss widely used in intensity image super-resolution and style transfer tasks. We show that with some modifications, this architecture can be used for depth map super-resolution. Finally, we evaluate our results on both synthetic and real data, and consider applications for autonomous vehicles and creating AR/MR video games.
AIST is a scientific conference on Analysis of Images, Social Networks, and Texts. The conference is intended for computer scientists and practitioners whose research interests involve Internet mathematics and other related fields of data science. Similar to the previous year, the conference will be focused on applications of data mining and machine learning techniques to various problem domains: image processing, analysis of social networks, and natural language processing. We hope that the participants will benefit from the interdisciplinary nature of the conference and exchange experience.
As the number of digital texts increases rapidly, there is a pressing need for more advanced and diverse tools of natural language processing. While purely statistical approaches proved powerful and efficient for many NLP tasks, there are many applications that would benefit from the formal models and approaches traditional language science has to offer. With hopes to facilitate this interaction between theory and practical implementation, we are pleased to announce the workshop on Computational Linguistics and Language Science to be held in Moscow, Russia on April 25, 2016 (11 AM to 6 PM).
This paper provides the reader with a report on 9th Russian Summer School in Information Retrieval (RuSSIR 2015).
We propose a new algorithm for consensus clustering, FCA-Consensus, based on Formal Concept Analysis. As the input, the algorithm takes T partitions of a certain set of objects obtained by k-means algorithm after T runs from different initialisations. The resulting consensus partition is extracted from an antichain of the concept lattice built on a formal context objects×classes, where the classes are the set of all cluster labels from each initial k-means partition. We compare the results of the proposed algorithm in terms of ARI measure with the state-of-the-art algorithms on synthetic datasets. Under certain conditions, the best ARI values are demonstrated by FCA-Consensus.
This paper provides the reader with a report on 10th Russian Summer School in Information Retrieval (RuSSIR 2016).
In this paper an extension of tf-idf weighting on annotated suffix tree (AST) structure is described. The new weighting scheme can be used for computing similarity between texts, which can further serve as in input to clustering algorithm. We present preliminary tests of us-ing AST for computing similarity of Russian texts and show slight im-provement in comparison to the baseline cosine similarity after applying spectral clustering algorithm.
In coming years residential consumers will face real-time electricity tariffs with energy prices varying day to day, and effective energy saving will require automation - a recommender system, which learns consumer's preferences from her actions. A consumer chooses a scenario of home appliance use to balance her comfort level and the energy bill. We propose a Bayesian learning algorithm to estimate the comfort level function from the history of appliance use. In numeric experiments with datasets generated from a simulation model of a consumer interacting with small home appliances the algorithm outperforms popular regression analysis tools. Our approach can be extended to control an air heating and conditioning system, which is responsible for up to half of a household's energy bill.
The 13th International Conference on “Concept Lattices and Applications (CLA 2016)” was held at National Research University Higher School of Economics, Moscow, Russia from July 18 until July 22, 2016. The CLA conference, organized since 2002, aims to provide to everyone interested in Formal Concept Analysis and more generally in Concept Lattices or Galois Lattices, an advanced view on some of the last research trends and applications in this field. It also aims to bring together students, professors, researchers and engineers, involved in all aspects of the study of concept lattices, from theory to implementations and practical applications. As the diversity of the selected papers shows, there is a wide range of research directions, around data and knowledge processing, including data mining, knowledge discovery, knowledge representation, reasoning, pattern recognition, together with logic, algebra and lattice theory. The program of the conference includes four keynote talks given by the following distinguished researchers: Lev D. Beklemishev (Mathematical Institute of Russian Academy of Science, Moscow), J´erˆome Euzenat (INRIA Grenoble Rhˆone-Alpes), Bernhard Ganter (TU-Dresden), Boris G. Mirkin (National Research University Higher School of Economics, Moscow). This volume includes the selected papers and the abstracts of the invited talks. This year, 46 papers were submitted, from which 28 papers were accepted as regular papers. We would like to thank here the contributing authors for their valuable work, the members of the program committee and the external reviewers who analyzed the papers with care. All of them participated to the continuing quality and importance of CLA, highlighting its key role in the field. Then we would also like to thank the steering committee of CLA for giving us the occasion of leading this edition of CLA, the conference participants for their participation and support, and people in charge of the organization, especially Larisa I. Antropova, Ekaterina L. Chernyak, Dmitry I. Ignatov, Olga V. Maksimenkova, whose help was very precious in many occasions and that contributed to the success of the event. We would like to thank our sponsors, namely National Research University Higher School of Economics, ExactPro company, Russian Foundation for Basic Research. Finally, we also do not forget that the conference was managed (quite easily) with the Easychair system, for many tasks including paper submission, selection, and reviewing.
This is the first textbook on attribute exploration, its theory, its algorithms for applications, and some of its many possible generalizations. Attribute exploration is useful for acquiring structured knowledge through an interactive process, by asking queries to an expert. Generalizations that handle incomplete, faulty, or imprecise data are discussed, but the focus lies on knowledge extraction from a reliable information source.
The method is based on Formal Concept Analysis, a mathematical theory of concepts and concept hierarchies, and uses its expressive diagrams. The presentation is self-contained. It provides an introduction to Formal Concept Analysis with emphasis on its ability to derive algebraic structures from qualitative data, which can be represented in meaningful and precise graphics.
Pattern structures are known to provide a tool for predictive modeling and classification. However, in order to generate classification rules concept lattice should be built. This procedure may take much time and resources. In previous work it was shown that it is possible to escape the problem with so-called lazy associative classification algorithm. It does not require lattice construction and it is applicable to classification problems such as credit scoring. In this paper we adjust this method to the case of continuous target variable, i.e. regression problem, and apply it to recovery rates forecasting. We perform parameters tuning, assess the accuracy of the algorithm based on the bank data and compare it to the models adopted in the bank system and other benchmarks.
Mind mapping approach is acknowledged as a fruitful collaborative educational technique. However, there is a lack of researches on students’ experience during learning with mind maps. Nowadays, information technologies are developed and wide spread impetuously. Thus digital mind maps become more and more popular. The process of their creation is strongly supported by different software, but little is known about this software application to educational needs. This paper aims to fill this gap. The comprehension of mind mapping approach adoption is implemented in a form of pedagogical reflection. The data for the pedagogical reflection were gained from the research, which was designed in a mixed methodology. The combination of a survey and a participant observation aimed to get collaborative data on students' perception and estimations of mind mapping. The survey's questionnaire was developed based on the technique's functions and results of participant observation. The analysis highlighted that the Coggle may be confidently use as an educational software in case of supporting in-class and home collaborative activities on mind mapping. As a result, the set of recommendations for teaching with mind maps was developed. The directions for a further work are discussed.
A linguistic method for determining whether given text is a rumor or disinformation is proposed, based on web mining and linguistic technology comparing two text fragments. We hypothesize about a family of content generation algorithms which are capable of producing deception from a portion of genuine, original text. We then propose a disinformation detection algorithm which finds a candidate source of text on the web and compares it with the given text, applying parse thicket technology. Parse thicket is a graph combined from a sequence of parse trees augmented with inter-sentence relations for anaphora and rhetoric structures. We evaluate our algorithm in the domain of customer reviews, considering a product review as an instance of possible deception. It is confirmed as a plausible way to detect rumor and deception in a web document.