At the ISSA laboratory seminar Daniel Borchman gave a talk entitled: Probably Approximately Correct Computation of the Canonical Basis
Daniel Borchman, Postdoctoral Research Associate, Technische Universität Dresden
To learn knowledge from relational data, extracting functional dependencies is a common approach. A way to achieve this extraction is to convert the given data into so-called formal contexts and afterwards compute exact implicational bases of them. A particularly interesting such basis is the so-called canonical basis, which is not only a basis of minimal cardinality, but for which also algorithms are known that can perform well in practice. However, all these algorithms are of high runtime complexity, i.e., are not output-polynomial, and are thus likely to fail in certain situations. On the other hand, most data sets stemming from real world applications are faulty to a certain degree, and an exact representation of its implicational knowledge – as provided by the canonical basis – may not helpful anyway. Usual approaches of considering association rules instead of implications usually do not solve this problem satisfactorily, as they still require to compute exact implication bases.
This talk wants to investigate an alternative approach of learning approximations of implicational knowledge from data. For this, we revisit the notion of probably approximately correct implication bases (PAC bases), survey known approaches and results about the feasibility of computing such bases, and shall discuss first experimental results showing their usefulness. In particular, we shall show how methods from query learning can be leveraged to obtain an algorithm that allows to compute PAC bases in output polynomial time. Finally, we shall give an outlook how attribute exploration, an interactive learning approach based on querying domain experts, can be combined with PAC bases to obtain a probably approximately correct attribute exploration algorithm.