Dualization of a monotone Boolean function on a finite lattice can be represented by transforming the set of its minimal 1 values to the set of its maximal 0 values. In this paper we consider finite lattices given by ordered sets of their meet and join irreducibles (i.e., as a concept lattice of a formal context). We show that in this case dualization is equivalent to the enumeration of so-called minimal hypotheses. In contrast to usual dualization setting, where a lattice is given by the ordered set of its elements, dualization in this case is shown to be impossible in output polynomial time unless P = NP. However, if the lattice is distributive, dualization is shown to be possible in subexponential time.
We propose a new algorithm for consensus clustering, FCA-Consensus, based on Formal Concept Analysis. As the input, the algorithm takes T partitions of a certain set of objects obtained by k-means algorithm after T runs from different initialisations. The resulting consensus partition is extracted from an antichain of the concept lattice built on a formal context objects×classes, where the classes are the set of all cluster labels from each initial k-means partition. We compare the results of the proposed algorithm in terms of ARI measure with the state-of-the-art algorithms on synthetic datasets. Under certain conditions, the best ARI values are demonstrated by FCA-Consensus.
In this paper an extension of tf-idf weighting on annotated suffix tree (AST) structure is described. The new weighting scheme can be used for computing similarity between texts, which can further serve as in input to clustering algorithm. We present preliminary tests of us-ing AST for computing similarity of Russian texts and show slight im-provement in comparison to the baseline cosine similarity after applying spectral clustering algorithm.
In coming years residential consumers will face real-time electricity tariffs with energy prices varying day to day, and effective energy saving will require automation - a recommender system, which learns consumer's preferences from her actions. A consumer chooses a scenario of home appliance use to balance her comfort level and the energy bill. We propose a Bayesian learning algorithm to estimate the comfort level function from the history of appliance use. In numeric experiments with datasets generated from a simulation model of a consumer interacting with small home appliances the algorithm outperforms popular regression analysis tools. Our approach can be extended to control an air heating and conditioning system, which is responsible for up to half of a household's energy bill.
The 13th International Conference on “Concept Lattices and Applications (CLA 2016)” was held at National Research University Higher School of Economics, Moscow, Russia from July 18 until July 22, 2016. The CLA conference, organized since 2002, aims to provide to everyone interested in Formal Concept Analysis and more generally in Concept Lattices or Galois Lattices, an advanced view on some of the last research trends and applications in this field. It also aims to bring together students, professors, researchers and engineers, involved in all aspects of the study of concept lattices, from theory to implementations and practical applications. As the diversity of the selected papers shows, there is a wide range of research directions, around data and knowledge processing, including data mining, knowledge discovery, knowledge representation, reasoning, pattern recognition, together with logic, algebra and lattice theory. The program of the conference includes four keynote talks given by the following distinguished researchers: Lev D. Beklemishev (Mathematical Institute of Russian Academy of Science, Moscow), J´erˆome Euzenat (INRIA Grenoble Rhˆone-Alpes), Bernhard Ganter (TU-Dresden), Boris G. Mirkin (National Research University Higher School of Economics, Moscow). This volume includes the selected papers and the abstracts of the invited talks. This year, 46 papers were submitted, from which 28 papers were accepted as regular papers. We would like to thank here the contributing authors for their valuable work, the members of the program committee and the external reviewers who analyzed the papers with care. All of them participated to the continuing quality and importance of CLA, highlighting its key role in the field. Then we would also like to thank the steering committee of CLA for giving us the occasion of leading this edition of CLA, the conference participants for their participation and support, and people in charge of the organization, especially Larisa I. Antropova, Ekaterina L. Chernyak, Dmitry I. Ignatov, Olga V. Maksimenkova, whose help was very precious in many occasions and that contributed to the success of the event. We would like to thank our sponsors, namely National Research University Higher School of Economics, ExactPro company, Russian Foundation for Basic Research. Finally, we also do not forget that the conference was managed (quite easily) with the Easychair system, for many tasks including paper submission, selection, and reviewing.
This is the first textbook on attribute exploration, its theory, its algorithms for applications, and some of its many possible generalizations. Attribute exploration is useful for acquiring structured knowledge through an interactive process, by asking queries to an expert. Generalizations that handle incomplete, faulty, or imprecise data are discussed, but the focus lies on knowledge extraction from a reliable information source.
The method is based on Formal Concept Analysis, a mathematical theory of concepts and concept hierarchies, and uses its expressive diagrams. The presentation is self-contained. It provides an introduction to Formal Concept Analysis with emphasis on its ability to derive algebraic structures from qualitative data, which can be represented in meaningful and precise graphics.
Pattern structures are known to provide a tool for predictive modeling and classification. However, in order to generate classification rules concept lattice should be built. This procedure may take much time and resources. In previous work it was shown that it is possible to escape the problem with so-called lazy associative classification algorithm. It does not require lattice construction and it is applicable to classification problems such as credit scoring. In this paper we adjust this method to the case of continuous target variable, i.e. regression problem, and apply it to recovery rates forecasting. We perform parameters tuning, assess the accuracy of the algorithm based on the bank data and compare it to the models adopted in the bank system and other benchmarks.
Mind mapping approach is acknowledged as a fruitful collaborative educational technique. However, there is a lack of researches on students’ experience during learning with mind maps. Nowadays, information technologies are developed and wide spread impetuously. Thus digital mind maps become more and more popular. The process of their creation is strongly supported by different software, but little is known about this software application to educational needs. This paper aims to fill this gap. The comprehension of mind mapping approach adoption is implemented in a form of pedagogical reflection. The data for the pedagogical reflection were gained from the research, which was designed in a mixed methodology. The combination of a survey and a participant observation aimed to get collaborative data on students' perception and estimations of mind mapping. The survey's questionnaire was developed based on the technique's functions and results of participant observation. The analysis highlighted that the Coggle may be confidently use as an educational software in case of supporting in-class and home collaborative activities on mind mapping. As a result, the set of recommendations for teaching with mind maps was developed. The directions for a further work are discussed.
A linguistic method for determining whether given text is a rumor or disinformation is proposed, based on web mining and linguistic technology comparing two text fragments. We hypothesize about a family of content generation algorithms which are capable of producing deception from a portion of genuine, original text. We then propose a disinformation detection algorithm which finds a candidate source of text on the web and compares it with the given text, applying parse thicket technology. Parse thicket is a graph combined from a sequence of parse trees augmented with inter-sentence relations for anaphora and rhetoric structures. We evaluate our algorithm in the domain of customer reviews, considering a product review as an instance of possible deception. It is confirmed as a plausible way to detect rumor and deception in a web document.
We present a multiplayer first-person shooter (FPS) game with advanced intelligent non-playable characters (NPC) under computer control. The game is specially adapted for playing in VR headset so the simulator sickness symptoms are significantly reduced.
The demo allows users to play with the other human and NPC players in a shooter game made in Unreal Engine 4. User can verify his/her game skills versus evolving human-like NPCs with a level adjusting model. The humanness of NPC was verified with Alan Turing game test beating 52\% record from BotPrize'12 competition.
This paper presents statistics of a controlled laboratory gift-exchange-game experiment. These numbers can be used for assumptions about human behavior in analysis of noisy web data. The experiment was described in ‘The Impact of Social Comparisons on Reciprocity’ by Gachter et al. 2012. As already shown in relevant literature from experimental economics, human decisions deviate from rational payoff maximization. The average gift rate was 31%. Gift rate was under no conditions zero. Further, we derive some additional findings and calculate their significance.
Nowadays decision tree learning is one of the most popular classification and regression techniques. Though decision trees are not accurate on their own, they make very good base learners for advanced tree-based methods such as random forests and gradient boosted trees. However, applying ensembles of trees deteriorates interpretability of the final model. Another problem is that decision tree learning can be seen as a greedy search for a good classification hypothesis in terms of some information-based criterion such as Gini impurity or information gain. But in case of small data sets the global search might be possible. In this paper, we propose an FCA-based lazy classification technique where each test instance is classified with a set of the best (in terms of some information-based criterion) rules. In a set of benchmarking experiments, the proposed strategy is compared with decision tree and nearest neighbor learning.
This book constitutes the thoroughly refereed proceedings of the 8 th Russian Summer School on Information Retrieval, RuSSIR 2014, held in Nizhniy Novgorod, Russia, in August 2014.
The 14 papers presented were selected from various submissions. The papers focus on visualization for information retrieval along with other topics related to information retrieval.
Decision tree learning is one of the most popular classifica- tion techniques. However, by its nature it is a greedy approach to finding a classification hypothesis that optimizes some information-based crite- rion. It is very fast but may lead to finding suboptimal classification hy- potheses. Moreover, in spite of decision trees being easily interpretable, ensembles of trees (random forests and gradient-boosted trees) are not, which is crucial in some domains, like medical diagnostics or bank credit scoring. In case of such “small, but important-data” problems one is not obliged to perform a greedy search for classification hypotheses, and therefore alternatives to decision tree learning techniques may be con- sidered. In this paper, we propose an FCA-based classification technique where each test instance is classified with a set of the best (in terms of some information-based criterion) classification rules. In a set of bench- marking experiments, the proposed strategy is compared with decision tree and nearest neighbor learning.
We present two examples of how human-like behavior can be implemented in a model of computer player to improve its characteristics and decision-making patterns in video game. At first, we describe a reinforcement learning model, which helps to choose the best weapon depending on reward values obtained from shooting combat situations. Secondly, we consider an obstacle avoiding path planning adapted to the tactical visibility measure. We describe an implementation of a smoothing path model, which allows the use of penalties (negative rewards) for walking through ``bad'' tactical positions. We also study algorithms of path finding such as improved I-ARA* search algorithm for dynamic graph by copying human discrete decision-making model of reconsidering goals similar to Page-Rank algorithm. All the approaches demonstrate how human behavior can be modeled in applications with significant perception of intellectual agent actions.
Nowadays peer assessment is recognized as a crucial part of a wide range of active learning routines. Nevertheless, practitioners and educators speak of the complexity and high resource consumption for the implementation of this type of assessment. Undoubtedly, convenient software that supports peer assessment processes may substantially raise productivity of its participants. A review of educational literature and free software shows there are several bottlenecks in the business processes of peer assessment and key user roles. First, most of the programs examined are web-based and expand a set of tools for teachers and learners by extra interfaces. Moreover, this logically creates a new branch in the learning business process. Second, there is probably no peer assessment system which allows users to attach something other than the text to be reviewed. There is a gap in the market of free peer assessment software. This paper oﬀ ers a peer assessment system speciﬁ cation that attempts to eliminate these disadvantages in order to improve user experience and thus increase the use of information technologies in peer assessment. The speciﬁ cation is based on a thorough description of the peer assessment process involving complex artifacts and double-blinded peer review. Software called PASCA (peer assessment system for complex artifacts) is introduced to illustrate the speciﬁ cation achieved. PASCA uses habitual e-mail services and does not aﬀ ect other business processes. It supports standard features like blinding and randomization, and it provides a set of original features. They contain evaluation of arbitrary artifacts, creation of complex peer review forms with their validation and scoring, and easy analysis of data from peer assessment sessions.
The Lambek calculus can be considered as a version of non-commutative intuitionistic linear logic. One of the interesting features of the Lambek calculus is the so-called “Lambek’s restriction,” that is, the antecedent of any provable sequent should be non-empty. In this paper we discuss ways of extending the Lambek calculus with the linear logic exponential modality while keeping Lambek’s restriction. We present several versions of the Lambek calculus extended with exponential modalities and prove that those extensions are undecidable, even if we take only one of the two divisions provided by the Lambek calculus.
Nowadays data-sets are available in very complex and heterogeneous ways. Mining of such data collections is essential to support many real-world applications ranging from healthcare to marketing. In this work, we focus on the analysis of “complex” sequential data by means of interesting sequential patterns. We approach the problem using the elegant mathematical framework of formal concept analysis and its extension based on “pattern structures”. Pattern structures are used for mining complex data (such as sequences or graphs) and are based on a subsumption operation, which in our case is defined with respect to the partial order on sequences. We show how pattern structures along with projections (i.e. a data reduction of sequential structures) are able to enumerate more meaningful patterns and increase the computing efficiency of the approach. Finally, we show the applicability of the presented method for discovering and analysing interesting patient patterns from a French healthcare data-set on cancer. The quantitative and qualitative results (with annotations and analysis from a physician) are reported in this use-case which is the main motivation for this work.