Formal Concept Analysis Research Toolbox
Formal Concept Analysis Research Toolbox (FCART) is intended for support of research in the field of Data Science: Classification, Clusterization, Attribute Exploration and other Data Analysis tasks. A new system should provide:
- Universal integrated environment for knowledge and data engineers.
- Built-in set of universal research tools based on FCA and multimodal clustering techniques for working with object-attribute data representation.
- Additional tools for import/export of data and data preprocessing.
- Extendibility of the research tools on several levels: from internal scripts to plugins.
- Generation of rich, visually appealing reports.
Our goal is to create a universal integrated environment for knowledge and data engineers. FCART is constructed upon an iterative data analysis methodology and provides a built-in set of research tools based on universal Formal Concept Analysis techniques for working with object-attribute data representations.
- Iterative process of data analysis using FCA entities and methods.
- Separation of processes of data querying (from various data sources), data preprocessing (of locally saved immutable snapshots), data analyzing (in interactive visualizers of immutable analytic artifacts), and results formalizing (in a report editor).
- Explicit definition of analytic artifacts and their types. It allows checking integrity of the session data and provides links between artifacts for an end-user.
- Integrated performance estimation tools.
- Integrated documentation of software tools and data analysis methods.
Some of FCA entities appear to be fundamental to information representation. In FCART, we use the term “analytic artifact” which denotes the definition of abstract interface, describing the entity of the analytic process.
The basic artifact for FCA-based methods is that of “formal context”, i.e., object-attribute representation of a subject domain. Most important artifacts include “concept lattice” and “formal concept”.
All artifact instances are linked by “origination”. For example, we can generate the concept lattice from the formal context. In this case, the formal concept will be an "origin artifact" for the lattice. Another example is lattice and “association rules” – the lattice is the origin of the rules. Any artifact instance is immutable. It means that an instance cannot be changed after creation, but can be visualized in various ways.
If we have the predefined set of artifacts in most cases we can use the term “artifact” instead of “artifact instance” without ambiguity. Collection of all artifacts in current analytic cycle forms so-called analytical session.
All types of artifacts are generated by solvers. Each solver requires one or many artifact instances of preassigned types as input and produces one artifact instance of preassigned type as output.
Having predefined types of artifacts and links (assigned by solvers) between immutable artifact instances, we can check an integrity of data of particular analytical session. Without explicit user action, a session cannot lose any artifact instances and links, and guarantees integrity of a session.
Artifact visualizer is a special solver that generates user-oriented visual representation of input artifact instance. From a technical point of view, visualizer produces interactive or non-interactive window with some elements of user interface. Of course, one artifact can have different kinds of visual appearance.
Usually, visualizer is the last in a chain of solvers. However, we can get a visual representation of each artifact in a session. For example, lattice browser generates a diagram of a lattice and allows a user to manipulate the diagram, but this browser does not generate new artifacts. We need to distinguish generation of new artifact and drawing of existing artifact for various purposes: working in the batch mode, increasing efficiency of long chains of solvers, benchmarking, etc.
Report is a final result of research. Every scientific environment must provide a report rich text editor with additional functionality to avoid mistakes while converting and moving multiple results with metadata to an external editor. The main feature of the editor is an automatic insertion of fully decorated artifact representation in the resulting report.
At this moment, we introduce the version of FCART in the form of local Windows application.
We use Microsoft and Embarcadero programming environments and different programming languages (C++, C#, Delphi, Python and other). For scripting we use Delphi Web Script and Python. Native executable (the core of the system) is compatible with Microsoft Windows 2000 and later and has not any additional dependences.
Another line of development is Web-version of system based on Microsoft .NET platform. For now architecture and some key components are ready, but we are going to focus on Web-development after finishing local version 0.9.
- Querying external data sources.
- Manipulating collections (with metadata) in LocalDB.
- Generating snapshots as a main artifacts in form of many-valued contextes.
- Effective building of concept lattice for binary context.
Current version of FCART consists of the following components.
- Core component includes:
- multiple-document user interface of research environment with session manager and extensions manager,
- snapshot profiles editor (SHPE),
- snapshot query editor (SHQE),
- query rules database (RDB),
- session database (SDB),
- main part of report builder.
- Local Data Storage (LDS) for preprocessed data.
- Internal solvers and visualizers.
- Additional plugins, scripts and report templates.
Main FCA workflow
From the analyst point of view, basic FCA workflow in FCART has four stages. On each stage, a user has the ability to import/export every artifact or add it to report.
- Filling Local Data Storage (LDS) of FCART from various external SQL, XML or JSON-like data sources (querying external source described by External Data Query Description - EDQD). EDQD can be produced by some External Data Browser.
- Loading a data snapshot from local storage into current analytic session (snap-shot described by Snapshot Profile). Data snapshot is a data table with anno-tated structured and text attributes, loaded in the system by accessing LDS.
- Transforming the snapshot to a binary context (transformation described by Scaling Query).
- Building and visualizing formal concept lattice and other artifacts based on the binary context in a scope of analytic session.
- FCART 0.9.5 – August 2014
- New version of Web-service for LDS – Summer 2014.
- FCART 1.0 – November 2014.
Выделите её, нажмите Ctrl+Enter и отправьте нам уведомление. Спасибо за участие!