The 3rd International Workshop "Formal Concept Analysis for Knowledge Discovery", 7June 2019
Formal concepts proved to be of big importance for knowledge discovery, both as a tool for concise representation of association rules and a tool for clustering and constructing taxonomies. The FCA4KD workshop aims at bringing together researchers working on diverse aspects of FCA-based knowledge extraction with the applications to fields like Computer and Information Science, Linguistics, Life and Social Sciences, Bioengineering, Chemistry, etc.
Topics of Interest
Main topics of interest include, but are not limited to:
· concept lattices and related structures
· attribute implications and data dependencies
· data preprocessing
· redundancy and dimensionality reduction
· information retrieval
· association rules and other data dependencies
A.V. Rodin,Institute of Philosophy RAS, "Truth and Justification in Knowledge Representation"
While the traditional philosophical epistemology stresses the importance of distinguishing knowledge from true beliefs, the formalisation of this distinction with standard logical means turns out to be problematic. In Knowledge Representation (KR) as a Computer Science discipline this crucial distinction is largely neglected. A practical consequence of this neglect is that the existing KR systems provide their users to knowledge, which they cannot verify and justified by means of the system itself. In terms of the traditional epistemology what such an user gets is certain (possibly true) belief but not knowledge sensu stricto.
Recent advances in the research area at the crossroad of the computational mathematical logic, formal epistemology and computer science open new perspectives for an effective computational realisation of justificatory procedures in KR. After exposing the problem of justification in logic, epistemology and KR, we sketch a novel framework for representing knowledge along with relevant justificatory procedures, which is based on the Homotopy Type theory (HoTT) and supports representation of both propositional knowledge, aka knowledge-that, and non-propositional knowledge, aka knowledge-how or procedural knowledge. The default proof-theoretic semantics of HoTT allows for combining the two sorts of represented knowledge at the formal level by interpreting all permissible constructions as justification terms (witnesses) of associated propositions.
M. Yu. Bogatyrev, Tula State University (TSU), "Towards constructing multidimensional formal contexts on natural language texts"
Recent success of applying vector-based and graph-based models of text’s semantics demonstrates possible interpretation of semantics as multidimensional notion. In this paper, brief survey of such models is presented and an idea of modeling multidimensional text’s semantics with multidimensional formal contexts is discussed. Several variants of realization of three-dimensional formal contexts with the usage of text’s semantic model of conceptual graphs are presented. Investigations were made on the texts of abstracts of biomedical papers from the PubMed databases.
N.V. Shelov, Innopolis University, "Designing ontology for classification and navigation in Computer Languages Universe"
During the semicentennial history of Computer Science and Information Technologies, several thousands of computer languages have been created. The computer language universe includes languages for different purposes (programming, specification, modeling, etc.). In each of these branches of computer languages it is possible to track several approaches (imperative, declarative, object-oriented, etc.), disciplines of processing (sequential, non-deterministic, distributed, etc.), and formalized models, such as Turing machines or logic inference machines. The listed arguments justify the importance of of an adequate classification for computer languages. Computer language paradigms are the basis for the classification of the computer languages. They are based on joint attributes which allow us to differentiate branches in the computer language universe. We present our computer-aided approach to the problem of computer language classification and paradigm identification. The basic idea consists in the development of a specialized knowledge portal for automatic search and updating, providing free access to information about computer languages. The primary aims of our project are the research of the ontology of computer languages and assistance in the search for appropriate languages for computer system designers and developers. The paper presents our vision of the classification problem, basic ideas of our approach to the problem, current state and challenges of the project, and design of query language (based on combination of temporal, belief, description logics augmented by FCA constructs - derivatives and concepts).
S.A. Nersisjan, MSU, "Fitting a mixture of distributions that are close to uniform on boxes"
Fitting mixture distributions is a widely used clustering approach which finds many applications in various areas like computer science, biology, medicine etc. Since in the most cases there is no exact algorithm for global maximum likelihood (or maximum a posteriori) estimation of mixture distribution parameters, some special local optimization techniques like EM-algorithm are usually utilized.
In this work EM-algorithm was applied to a mixture of generalized Gaussian distributions which played the role of a smooth approximation to the uniform distribution on a box with variable position and edge lengths. One of the advantages of this approach is interpretability: for each of resulting clusters and each data feature the algorithm will output the corresponding range.
The approach proposed can be considered as a generalization of the previously studied problem of optimal box positioning which can be also formulated as a problem from formal concept analysis, namely, the problem of finding an interval pattern concept of maximum extent size.
D.V. Vinogradov, Federal Research Center "Informatics and Management" of the Russian Academy of Sciences, "Random similarities computed on GPGPU"
The paper describes an implementation of a very simple probabilistic algorithm for finding similarities between training examples for General-purpose graphics card (GPGPU) calculations. The algorithm was programmed in OpenCL and its capabilities were investigated using AMD Radeon VII graphics card under Kubuntu Linux 18.04 LTS.
E.F. Goncharova, HSE, "Increasing the efficiency of packet classifiers based on closed descriptions".
The efficient representation of packet classifiers has become a significant challenge due to the rapid growth of data kept and processed in the forwarding tables. In our work we propose two novel techniques for reducing the size of forwarding tables both in length and width by the elimination of redundant bits and unreachable actions. We consider the task of transferring the forwarding packet to the correct destination as the task of multinomial classification. Thus, the process of reducing the forwarding table size corresponds to feature selection procedure with slight modifications. The presented techniques are based on computation of closed description and building the decision trees for classification. The main challenge in applying decision trees to the task is processing the overlapping rules. To overcome this challenge we propose to imply the JSM hypothesis technique to eliminate the unreachable actions assigned to the overlapping rules. The experiments were held on data generated by the ClassBench software. The proposed approaches result in significant decrease of bits that should be included in the forwarding tables as features.
А.А. Neznanov, HSE, "Ontology Based Learning and FCA-based Approach in Automatic Item Generation"
In the report we discuss modern state of methodologies, methods and tools of an automatic item generation (AIG) for knowledge assessment. The most interested questions are problem of developing specific learning ontologies for AIG optimization, the role of the Semantic Web and other knowledge technology stacks in education, the implementation of adaptive and personalized learning.
We propose specific ontology consisted of a thesaurus, scale definitions, term distinctions and formal contexts linked with thesaurus nodes and scales. Such ontology helps to generate test items and provide an adaptive assessment of learning outcomes on several levels. We also discuss an architecture, requirements and basic components of distributed software system for adaptive learning process support.