167005, Moscow, 11, Pokrovsky boulevard
The chapter considers a systematic approach to present “Files and file systems” in theoretical courses. The approach is based on different levels of abstraction for the concept introduced. We consider abstract files as mathematical objects, logical files as an implementation of abstract files in programming, and physical files as a mapping of logical files to the address space of storage media. Different levels of abstraction for file systems are introduced in the same way: logical file systems as collections of logical files, physical file systems as a mapping of logical file systems to storage media, and file systems as software tools that serve physical file systems. The abstraction hierarchy allows one to naturally introduce a hierarchy of operations on files and file systems and explain the functions performed by the file subsystems of operating systems. Logical structuring of the topic material facilitates the listeners’ perception of the topic, distinguishing essential general ideas from a set of secondary details.
Our submission to SV-COMP’21 is based on the software verification framework CPAchecker and implements the extension to the thread-modular approach. It considers every thread separately, but in a special environment which models thread interactions. The environment is expressed by projections of normal transitions in each thread. A projection contains a description of possible effects over shared data and synchronization primitives, as well as conditions of its application. Adjusting the precision of the projections, one can find a balance between the speed and the precision of the whole analysis.
Implementation on the top of the CPAchecker framework allows combining our approach with existing algorithms and analyses. Evaluation on the sv-benchmarks confirms the scalability and soundness of the approach.
In the domain of web security, websites strive to prevent themselves from data gathering performed by automatic programs called bots. In that way, crawler traps are an efficient brake against this kind of programs. By creating similar pages or random content dynamically, crawler traps give fake information to the bot and resulting by wasting time and resources. Nowadays, there is no available bots able to detect the presence of a crawler trap. Our aim was to find a generic solution to escape any type of crawler trap. Since the random generation is potentially endless, the only way to perform crawler trap detection is on the fly. Using machine learning, it is possible to compute the comparison between datasets of webpages extracted from regular websites from those generated by crawler traps. Since machine learning requires to use distances, we designed our system using information theory. We considered widely used distances compared to a new one designed to take into account heterogeneous data. Indeed, two pages does not have necessary the same words and it is operationally impossible to know all possible words by advance. To solve our problematic, our new distance compares two webpages and the results showed that our distance is more accurate than other tested distances. By extension, we can say that our distance has a much larger potential range than just crawler traps detection. This opens many new possibilities in the scope of data classification and data mining.
Abstract—Homing sequence derivation for nondeterministic finite state machines (NFSMs) has important applications in system5testing and verification. Unlike prior methods based on explicit tree-based search, in this article we formulate the derivation of a preset/6adaptive homing sequence in terms of quantified Boolean formula (QBF) solving. This formulation exploits compact circuit7representation of NFSMs and QBF encoding of the existence condition of homing sequence for effective computation. The implicit8circuit representation effectively avoids explicit state enumeration, and can be more scalable. Different encoding schemes and QBF9solvers are evaluated for their suitability for the homing sequence derivation. Experiments on various computation methods and10benchmarks show the generality and feasibility of a proposed approach.
This talk deals with new, innovative, data exfiltration techniques using laser printers. The aim is to understand the possibilities offered by laser printing to insert data subliminally on paper during printing when using office printers. These techniques are similar to those used in auxiliary channel attacks or sidechannel/covert channel attacks), which mainly target confidential environments requiring a high level of security (military, state, industrial sectors). By using the print function, not only it is possible to hide a message (invisible to the public eye) but also to decipher it easily once printed on a paper sheet. The objective is to make people aware of the need of strong security management of printers against unauthorized access to avoid data breach. The main reason lies in the fact that a simple malware hooking the print queue may enable confidential information to be added to legitimate documents and organize the leakage of sensitive information. Demos of our techniques will be made during the talk and source codes will be released.
Most modern democracies and states have adopted a large number of standards and norms to promote and harmonize international trade. The precautionary principle has come to complete this regulatory arsenal especially in the field of security of states and citizens, their health, their private life ... The aim is also to protect government agencies against wrong decisions, especially when uncertain, immature technologies are concerned. Social, political, institutional security and stability and now cybersecurity has become heavily dependent on these new forms of regulation. In this article we will show how this regulation arsenal could be exploited by cybercriminals. It is indeed possible through a broader vision of the notion of cyber attack to turn these norms and standards and this precautionary principle precisely against those they are supposed to protect. Among many possible scenarios, we consider a specific one for illustration with respect to the attack of voting machines. The m ain conclusion is that any (cyber)security risk analysis should now extend the mostly favoured technical view to a more operational vision in which non technical aspects also be included.
Formal models can be used to describe and reason about the behavior and properties of a given system. In some cases, it is even possible to prove that the system satisfies the given properties. This allows detecting design errors and inconsistencies early and fixing them before starting development. Such models are usually created using stepwise refinement: starting with the simple, abstract model of the system, and then incrementally refining it adding more details at each subsequent level of refinement. Top levels of the model usually describe the high-level design or purpose of the system, while the lower levels are more directly comparable with the implementation code. In this paper, we present a new, alternative refinement technique for Event-B which can simplify the development of complicated models with a large gap between high-level design and implementation.
Modern supervised algorithms assume that the dataset used for training has the same distributions as the data to be processed. However, the real data is permanently changing. This leads to the gradual degradation of supervised machine learning algorithms in production systems and increases the cost of the maintaining. To solve this problem, we are focusing on domain adaptation of machine learning algorithms in lifelong manner. We assume that real unlabelled data come in continuously. For this setting we propose a method for detecting changes in data distributions, as well as updating supervised algorithms. The idea behind the method is to process a portion of the data and create a new labelled dataset for training a supervised model. The trained model becomes a part of the ensemble used for selecting a strategy to deal with new examples: assign the label automatically using co-training or manually with the aid of active learning. This method is independent of the specific architecture of the model and could be used with any modern supervised algorithms, including artificial neural networks. Our research also confirms two findings. First, adding small portion of data with reliable labels to a self-labelled dataset improves model's performance, even if this amount is small to build a model from scratch. It is also shown that accumulating domain knowledge by continuously adding new trained models to ensemble used for labelling, reduces the amount of labelled data required while maintaining the high performance of the adapted model.
There is a high demand in practical methods and tools to ensure total correctness of critical software components. A usual assumption is that the machine code (or binary code) generated by a compiler follows the semantics of the programming language. Unfortunately, modern compilers such as GCC and LLVM are too complex to be thoroughly verified, and bugs in the generated code are not uncommon. As an alternative approach, we suggest proving that the generated machine code still satisfies the functional properties expressed at the source code level. The suggested approach takes an ACSL-annotated C function along with its binary code and checks that the binary code meets the ACSL annotations. The main steps are as follows: (1) disassembling the machine code and extracting its semantics; (2) adapting the annotations to the machine level and generating the verification conditions. The implementation utilizes MicroTESK, Frama-C, Why3, and other tools. To illustrate the solution, we use the RISC-V microprocessor architecture; however, the approach is relatively independent of the target platform as it is based on formal specifications of the instruction set. It is worth noting that the presented method can be exploited as a test oracle for compiler testing.
In the domain of web security, websites want to prevent themselves from data gathering performed by automatic programs called bots. In that way, crawler traps are an efficient brake against this kind of programs. By creating similar pages or random content dynamically, crawler traps give fake information to the bot and resulting by wasting time and resources. Nowadays, there is no available bots able to detect the presence of a crawler trap. Our aim was to find a generic solution to escape any type of crawler trap. Since the random generation is potentially endless, the only way to perform crawler trap detection is on the fly. Using machine learning, it is possible to compute the comparison between datasets of webpages extracted from regular websites from those generated by crawler traps. Since machine learning requires to use distances, we designed our system using information theory. We used wild used distances compared to a new one designed to take into account heterogeneous data. Indeed, two pages does not have necessary the same words and it is operationally impossible to know all possible words by advance. To solve our problematic, our new distance compares two webpages and the results showed that our distance is more accurate than other tested distances. By extension, we can say that our distance has a much larger potential range than just crawler traps detection. This opens many new possibilities in the scope of data classification and data mining.
Designing a trused access control mechanism of operating system (OS) is a complex task if the goal is to achieve high level of security assurance and guarantees of unwanted informatin flows absence. Even more complex it becomes when the integration of several haterogeneous mechanisms, like role-based access control (RBAC), mandatory integrity control (MIC), and multi-level security (MLS) is considered. This paper presents results of development of a hierarchical integrated model of access control and information flows (HIMACF), which provides a holistic integration of RBAC, MIC, and MLS preserving key security properties of all those mechanisms. Previos version of this model is called MROSL DP - model. Now the model is formalized using Event-B formal method and its correctness is formally verified. In the hierarchical representation of the model, each hierarchy lavel (module) corresponds to a separate security control mechanism, so the model can be verified with less effort reusing the results of verification of lower level modules. Nhe model is implemented in a Linux-based operating system using the Linux Security Modules infrastructure.
Distributed ledgers stimulate innovative services and enabled new applications in several domains, creating new concepts for trust and regulation. However, this backbone that is enabling novelties and abridging businesses comes with drawbacks and security flaws. In this paper, we evaluate several Distributed Ledger Technologies (DLTs) features depicting the Bitcoin, Ripple, Ethereum, Hyperledger, Algorand and IOTA networks. We focus on their security challenges and expose numerous threats and vulnerabilities. For instance, we have simulated a few of their possible attacks proving them non-immune. In the other hand, we show a few of their malicious use cases. Meticulously presenting DLTs menaces and flaws, we are not involved in preferring any specific DLT network.
Trace models such as Finite State Machines (FSMs) are widely used in the area of analysis and synthesis of discrete event systems. FSM minimization is a well-known optimization problem which allows to reduce the number of FSM states by deriving the canonical form that is unique up to isomorphism. In particular, the latter is a base for FSM-based ‘black-box’ test derivation methods such as W-method and its modifications. Since nowadays the behavior of many software and hardware systems significantly depends on time aspects, classical FSMs are extended by clock variables including input and output timeouts. The minimization of a Timed Finite State Machine (TFSM) includes both state and time aspects reduction. Existing approaches allow to derive the canonical representation for non-initialized deterministic TFSMs while for initialized TFSMs, i.e., TFSMs with the designated initial state, several pair-wise non-isomorphic minimal forms can exist. In this paper, we determine initialized TFSM classes for which the minimal form is unique up to isomorphism.
In this paper, we use a window approach when optimizing Finite State Machine (FSM) components of a multi module system. Given a window with a loop-free binary composition of complete deterministic FSMs, we construct a partial FSM for the tail component FSM such that any reduced form of this partial FSM can replace the tail component preserving the composition behaviour. There are a number of cases when using a partial network equivalent instead of the initial component FSM allows to simplify the corresponding logic circuit with respect to the number of gates and path length between primary inputs and outputs.
These days, real-time analytics is one of the most often used notions in the world of databases. Broadly, this term means very fast analytics over very fresh data. Usually the term comes together with other popular terms, hybrid transactional/analytical processing (HTAP) and in-memory data processing. The reason is that the simplest way to provide fresh operational data for analysis is to combine in one system both transactional and analytical processing. The most effective way to provide fast transactional and analytical processing is to store an entire database in memory. So on the one hand, these three terms are related but on the other hand, each of them has its own right to life. In this paper, we provide an overview of several in-memory data management systems that are not HTAP systems. Some of them are purely transactional, some are purely analytical, and some support real-time analytics. Then we overview nine in-memory HTAP DBMSs, some of which don't support real-time analytics. Existing real-time in-memory HTAP DBMSs have very diverse and interesting architectures although they use a number of common approaches: multiversion concurrency control, multicore parallelization, advanced query optimization, just in time compilation, etc. Additionally, we are interested whether these systems use non-volatile memory, and, if yes, in what manner. We conclude that an emergence of new generation of NVM will greatly stimulate its use in in-memory HTAP systems.
In this paper, we present the results of a deep TOR routing protocol analysis from a statistical and combinatorial point of view. We have modelled all possible routes of this famous anonymity network exhaustively while taking different parameters into account with the data provided by the TOR foundation only. We have then confronted our theoretical model with the reality on the ground. To do this, we have generated thousands of roads on the actual TOR network and compared the results obtained with those predicted by the theory. A last step of combinatorial analysis has enabled us to identify critical subsets of Onion routers (ORs) which 33%, 50%, 66% and 75% of the TOR traffic respectively depends on. We have also managed to extract most of the TOR relay bridges which are non public nodes managed by the TOR foundation. The same results as for the ORs have been observed.