167005, Moscow, 11, Pokrovsky boulevard
In the domain of web security, websites strive to prevent themselves from data gathering performed by automatic programs called bots. In that way, crawler traps are an efficient brake against this kind of programs. By creating similar pages or random content dynamically, crawler traps give fake information to the bot and resulting by wasting time and resources. Nowadays, there is no available bots able to detect the presence of a crawler trap. Our aim was to find a generic solution to escape any type of crawler trap. Since the random generation is potentially endless, the only way to perform crawler trap detection is on the fly. Using machine learning, it is possible to compute the comparison between datasets of webpages extracted from regular websites from those generated by crawler traps. Since machine learning requires to use distances, we designed our system using information theory. We considered widely used distances compared to a new one designed to take into account heterogeneous data. Indeed, two pages does not have necessary the same words and it is operationally impossible to know all possible words by advance. To solve our problematic, our new distance compares two webpages and the results showed that our distance is more accurate than other tested distances. By extension, we can say that our distance has a much larger potential range than just crawler traps detection. This opens many new possibilities in the scope of data classification and data mining.
Abstract—Homing sequence derivation for nondeterministic finite state machines (NFSMs) has important applications in system5testing and verification. Unlike prior methods based on explicit tree-based search, in this article we formulate the derivation of a preset/6adaptive homing sequence in terms of quantified Boolean formula (QBF) solving. This formulation exploits compact circuit7representation of NFSMs and QBF encoding of the existence condition of homing sequence for effective computation. The implicit8circuit representation effectively avoids explicit state enumeration, and can be more scalable. Different encoding schemes and QBF9solvers are evaluated for their suitability for the homing sequence derivation. Experiments on various computation methods and10benchmarks show the generality and feasibility of a proposed approach.
This talk deals with new, innovative, data exfiltration techniques using laser printers. The aim is to understand the possibilities offered by laser printing to insert data subliminally on paper during printing when using office printers. These techniques are similar to those used in auxiliary channel attacks or sidechannel/covert channel attacks), which mainly target confidential environments requiring a high level of security (military, state, industrial sectors). By using the print function, not only it is possible to hide a message (invisible to the public eye) but also to decipher it easily once printed on a paper sheet. The objective is to make people aware of the need of strong security management of printers against unauthorized access to avoid data breach. The main reason lies in the fact that a simple malware hooking the print queue may enable confidential information to be added to legitimate documents and organize the leakage of sensitive information. Demos of our techniques will be made during the talk and source codes will be released.
Most modern democracies and states have adopted a large number of standards and norms to promote and harmonize international trade. The precautionary principle has come to complete this regulatory arsenal especially in the field of security of states and citizens, their health, their private life ... The aim is also to protect government agencies against wrong decisions, especially when uncertain, immature technologies are concerned. Social, political, institutional security and stability and now cybersecurity has become heavily dependent on these new forms of regulation. In this article we will show how this regulation arsenal could be exploited by cybercriminals. It is indeed possible through a broader vision of the notion of cyber attack to turn these norms and standards and this precautionary principle precisely against those they are supposed to protect. Among many possible scenarios, we consider a specific one for illustration with respect to the attack of voting machines. The m ain conclusion is that any (cyber)security risk analysis should now extend the mostly favoured technical view to a more operational vision in which non technical aspects also be included.
Formal models can be used to describe and reason about the behavior and properties of a given system. In some cases, it is even possible to prove that the system satisfies the given properties. This allows detecting design errors and inconsistencies early and fixing them before starting development. Such models are usually created using stepwise refinement: starting with the simple, abstract model of the system, and then incrementally refining it adding more details at each subsequent level of refinement. Top levels of the model usually describe the high-level design or purpose of the system, while the lower levels are more directly comparable with the implementation code. In this paper, we present a new, alternative refinement technique for Event-B which can simplify the development of complicated models with a large gap between high-level design and implementation.
In the domain of web security, websites want to prevent themselves from data gathering performed by automatic programs called bots. In that way, crawler traps are an efficient brake against this kind of programs. By creating similar pages or random content dynamically, crawler traps give fake information to the bot and resulting by wasting time and resources. Nowadays, there is no available bots able to detect the presence of a crawler trap. Our aim was to find a generic solution to escape any type of crawler trap. Since the random generation is potentially endless, the only way to perform crawler trap detection is on the fly. Using machine learning, it is possible to compute the comparison between datasets of webpages extracted from regular websites from those generated by crawler traps. Since machine learning requires to use distances, we designed our system using information theory. We used wild used distances compared to a new one designed to take into account heterogeneous data. Indeed, two pages does not have necessary the same words and it is operationally impossible to know all possible words by advance. To solve our problematic, our new distance compares two webpages and the results showed that our distance is more accurate than other tested distances. By extension, we can say that our distance has a much larger potential range than just crawler traps detection. This opens many new possibilities in the scope of data classification and data mining.
Designing a trused access control mechanism of operating system (OS) is a complex task if the goal is to achieve high level of security assurance and guarantees of unwanted informatin flows absence. Even more complex it becomes when the integration of several haterogeneous mechanisms, like role-based access control (RBAC), mandatory integrity control (MIC), and multi-level security (MLS) is considered. This paper presents results of development of a hierarchical integrated model of access control and information flows (HIMACF), which provides a holistic integration of RBAC, MIC, and MLS preserving key security properties of all those mechanisms. Previos version of this model is called MROSL DP - model. Now the model is formalized using Event-B formal method and its correctness is formally verified. In the hierarchical representation of the model, each hierarchy lavel (module) corresponds to a separate security control mechanism, so the model can be verified with less effort reusing the results of verification of lower level modules. Nhe model is implemented in a Linux-based operating system using the Linux Security Modules infrastructure.
Distributed ledgers stimulate innovative services and enabled new applications in several domains, creating new concepts for trust and regulation. However, this backbone that is enabling novelties and abridging businesses comes with drawbacks and security flaws. In this paper, we evaluate several Distributed Ledger Technologies (DLTs) features depicting the Bitcoin, Ripple, Ethereum, Hyperledger, Algorand and IOTA networks. We focus on their security challenges and expose numerous threats and vulnerabilities. For instance, we have simulated a few of their possible attacks proving them non-immune. In the other hand, we show a few of their malicious use cases. Meticulously presenting DLTs menaces and flaws, we are not involved in preferring any specific DLT network.
Trace models such as Finite State Machines (FSMs) are widely used in the area of analysis and synthesis of discrete event systems. FSM minimization is a well-known optimization problem which allows to reduce the number of FSM states by deriving the canonical form that is unique up to isomorphism. In particular, the latter is a base for FSM-based ‘black-box’ test derivation methods such as W-method and its modifications. Since nowadays the behavior of many software and hardware systems significantly depends on time aspects, classical FSMs are extended by clock variables including input and output timeouts. The minimization of a Timed Finite State Machine (TFSM) includes both state and time aspects reduction. Existing approaches allow to derive the canonical representation for non-initialized deterministic TFSMs while for initialized TFSMs, i.e., TFSMs with the designated initial state, several pair-wise non-isomorphic minimal forms can exist. In this paper, we determine initialized TFSM classes for which the minimal form is unique up to isomorphism.
In this paper, we use a window approach when optimizing Finite State Machine (FSM) components of a multi module system. Given a window with a loop-free binary composition of complete deterministic FSMs, we construct a partial FSM for the tail component FSM such that any reduced form of this partial FSM can replace the tail component preserving the composition behaviour. There are a number of cases when using a partial network equivalent instead of the initial component FSM allows to simplify the corresponding logic circuit with respect to the number of gates and path length between primary inputs and outputs.
In this paper, we present the results of a deep TOR routing protocol analysis from a statistical and combinatorial point of view. We have modelled all possible routes of this famous anonymity network exhaustively while taking different parameters into account with the data provided by the TOR foundation only. We have then confronted our theoretical model with the reality on the ground. To do this, we have generated thousands of roads on the actual TOR network and compared the results obtained with those predicted by the theory. A last step of combinatorial analysis has enabled us to identify critical subsets of Onion routers (ORs) which 33%, 50%, 66% and 75% of the TOR traffic respectively depends on. We have also managed to extract most of the TOR relay bridges which are non public nodes managed by the TOR foundation. The same results as for the ORs have been observed.
This paper delves into the state of the art of computer virology formalisation then tackles the development of a new malware algorithm. It details how the work leveraged Blockchain to create an undetectable malware depicting two versions of the new malware, starting from a first naive version to achieve an advanced armoured undetectable k-ary malware that leverages decentralized storage namely IPFS. The detection of the new malware algorithm has been proven NP-complete.
The volume consists of scientifi c and research papers of the Sixth International Conference “Actual Problems of System and Software Engineering” (APSSE-2019). The Conference was held at the National Research University “Higher School of Economics” from November 12 to November 14, 2019 in Moscow, Russia. The conference was devoted to the analysis of the status, contemporary trends, research issues and practical results obtained by national and foreign scientists and experts in the system and software engineering area, as well as information and analytical systems development area using Big Data technologies. The target audience of the conference came to be the experts, students and postgraduates working in the area of ordering, designing, development, implementation, operation, and maintenance of information and analytical systems for various applications and their software, also working on custom software development. Plenary papers were delivered by the leading domestic and foreign specialists and were aimed at developing the views on the most important and fundamental aspects of the information technology development. Submitted articles were selected for publication. All the submitted articles were reviewed by the members of the Program Committee as well as by the independent reviewers.
Abstract——State identification sequences, such as homing
and distinguishing sequences (HS and DS), are widely used in FSM
(Finite State Machine) based testing in order to reduce the size of
a returned complete test suite as well as minimize checking efforts
in passive testing. Preset HS are known to always exist for
deterministic complete reduced FSMs but it is not the case for
nondeterministic FSMs. It is also known that in this case, adaptive
HS exist more often and usually are shorter than the preset.
Nowadays, a number of specifications are represented by
nondeterministic FSMs and thus, a deeper study of such sequences
is required. There exist sufficient and necessary conditions for the
existence of an adaptive HS for complete nondeterministic FSMs
when each state can be an initial state but those conditions become
only sufficient for weakly initialized FSMs where only some states
are initial. In this paper, we propose sufficient and necessary
conditions for a weakly initialized FSM to have an adaptive
homing sequence, possibly up to given length, which are based on
deriving an appropriate so-called homing FSM. The experimental
evaluation of the existence of adaptive and preset HS is performed
for randomly generated FSMs.
In the paper, we suggest new approach to schedulability problem for strict periodic tasks (a periodic task is strict if it must be started in equal intervals of time – task’s period). Given permissible tasks’ periods, our approach allows to obtain quickly all schedulable sets of tasks with such periods and to build immediately a conflict-free schedule for each obtained set. The approach is based on mathematical methods of graph theory and number theory. We illustrate the approach by a number of examples and present current practical results.
It is intuitively clear that search for scientific publications often has many characteristics of exploratory search. The purpose of this paper is to formalize this intuitive understanding, explore which scientific search tasks can be classified as research search ones, what general approaches to the research search problem exist, and how they are implemented in specialized search engines for scientists. We overview the existing works that address the information-seeking behavior of scientists and a special variant of search called exploratory search. There are several types of search behavior typical for scientists; we show that most of them are exploratory ones. Exploratory search differs from information retrieval and requires special support from the search systems. We analyze seventeen search systems for academicians (from Google Scholar, Scopus, and Web of Science to ResearchGate) from the perspective of exploratory search support. We find that most of them do not go far beyond simple information retrieval, and there is room for further improvements, especially in collaborative search support.