The ever-rising use of software-based systems to fulfill safety-critical missions calls for methods to ensure the adequacy of these systems to their missions. Most industrial sectors have brought an answer in the form of standards such as EN 50128 for railways, DO 178C for aviation, ISO-26262 for automotive, or more generally IEC-61508 for critical systems. All these standards have in common the central role given to test-based validation techniques. However, alternate and usually complementary techniques, grouped under the banner of “Formal Methods”, are gaining increasing attention for the development and validation of these systems. In particular, the “Model Checking” technique has seen the multiplication of its application fields in the last decade.
This post is the second of a serie of four. You may also be interested by the first post: S3 Executive Summary
This renewed interest for a technique dating back to the 80s has three major explanations:
The emergence of efficient symbolic state space exploration techniques, particularly those based on solving the SAT, and more recently the SMT problems. The adoption of these techniques leads to a continuous increase in the power of model checking engines, enabling to tackle problems of increasing size and complexity.
The largely automatic and a posteriori nature of the analyses, enabling a lightweight integration in existing development processes, and its relative ease of use.
The multiplicity of missions than can be performed using these techniques: proof of safety properties, equivalence proof between various artifacts of a system, bug chasing, undefined code behaviors detection, dead code detection, automatic test case generation, but also the resolution of routing, optimization, or planning problems.
To address its customers needs, Systerel has developed a formal verification solution, Systerel Smart Solver (S3), able to perform an a posteriori proof of the safety of critical systems. This solution, combining a specialized modeling language with a SAT-based symbolic model checker, has shown to be particularly efficient in handling industrial-size critical systems coming from various domains such as railways or avionics. This series of blog articles describes the S3 solution.
It begins with an overly simplified introduction to formal verification to present the underlying concepts needed in the rest of this blog post (section Formal Verification In A Nutshell). The next post S3 Workflow gives a generic description of the typical -based workflow used to perform the formal safety verification of critical systems. In particular, it details the architecture and the techniques used to protect such a solution from potential faults in order to make it trustworthy.
The third post Instanciated Systems describes the application of the workflow to the formal safety verification of systems obtained through the instantiation of generic design principles using some given data. In particular the systems produced by the railway industry.
Formal Verification in a Nutshell¶
This section briefly presents some specific concepts of formal verification. To limit its size, the adopted point of view has been intentionally twisted toward that of -based symbolic model checking, at the expense of generality.
Formal Verification is the act of mathematically proving that a system respects some properties under a number of hypotheses. When such a property is proved to hold, it means that it is impossible to find an input scenario satisfying the hypotheses in which the system would falsify this property. A proof is complete whereas more classical verification activities, such as tests or reviews, are intrinsically incomplete. As stated by E.W. Dijkstra, “Program testing can be a very effective way to show the presence of bugs, but it is hopelessly inadequate for showing their absence“.
In this document the term properties is restricted to the logical definition of safety 1 properties, stating that “something shall not happen“, as opposed to liveness properties, that state that “eventually something shall happen” which are difficult to analyze in the general case.
An important part of a formal safety verification solution is the definition of the safety specification against which the system will be verified. This formalized specification contains both the set of safety properties that the system shall respect, and a set of hypotheses made on the environment, in a form suitable for mathematical reasoning.
One of the most popular approach to formal verification is through Model Checking. In this technique, a model of the system 2 is used to verify that some given properties hold. Numerous methods and algorithms have been proposed to perform model checking, however this document will focus on a single class of model checking for a restriction of LTL to properties stating that some condition always holds (i.e. safety properties). Model checkers of this class verify that the properties hold for every possible system trace 3. Various technologies exists to explore the system traces (explicit state, BDD, SAT, SMT,… ), each resulting in some specific category of model checkers.
S3 falls into the category of SAT-based symbolic model checkers. These tools rely on specific algorithms, solving the problem of determining the satisfiability or unsatisfiability of a combinational boolean equation (the so-called problem), to carry the analyses needed in the exploration of the system traces.
Bounded Model Checking¶
Depending on the problem to solve, different analysis strategies of the model checker are used. Firstly, when searching for potential violations of the safety properties, a BMC strategy is used. In this analysis mode, traces of increasing length 4 are explored to search for potential falsifications of the safety properties. The search is bounded either by a user-provided trace length limit, or until time or memory ressources are exhausted. The outcome of this exploration is either an input scenario leading to the violation of the safety property, which is called a counter-example, or the insurance that there exists no violating trace of length less than a given constant. With its ability to exhibit problematic scenarios, the strategy is mainly used for debugging the system and the safety specification, or generating test cases.
When the proof of a safety property is needed, it becomes necessary to reason about unbounded time. Several methods can be used for that, but the most simple and straightforward is Induction Over Time. In this strategy, the model checker will attempt to find a proof of a safety property in two phases. First, in the base, it verifies that the property holds initially. Next, in the step, it verifies that, for every state of the system in which the property holds, the property also holds in all of its successor states 5. If both the base and the step hold, it can be concluded, applying the induction principle, that the property always holds.
Unfortunately, induction alone is sometimes insufficient for proving a property: it is not a complete proof method. For some properties, the model checker is neither able to find a falsification of the base, nor a proof of the step. This situation arises either because the property is falsifiable but the falsification occurs for traces longer than the ones analyzed by the model checker (in which case the user shall perform a BMC run up to that falsification), or because the property holds but cannot be proved using induction. The property is said to be non-inductive.
Suppose a system made of two boolean memories, one initially
and the other initially
false. At every cycle, both memories are
negated. Thus one memory follows the sequence:
false, true, false, true, …
while the other one follows the sequence:
true, false, true, false, …
Suppose that one wants to prove that, at any time, either the first
memory or the second is
true. This property is obviously true, and
a run of BMC will not find a falsification up to whatever depth it is
pushed. Using induction, the base trivially holds, but it is
impossible to verify the step. Indeed, starting from a state where
both memories are
true, for which the property holds, the
system reaches in one transition the state in which both memories are
false, for which the property does not hold. The model checker has
found a starting state in which the property holds, that leads to a
next state where it does not. The step fails to prove, and it cannot
be concluded that the property always holds. Of course, here it is
easy to see that the starting state cannot be reached from the initial
state of the system, but the induction strategy is not able to
determine that alone.
A scenario falsifying the step, driving the model from a state in which the property holds to one in which it does not, is called a step-counter-example. A state that cannot be reached from an initial state of the system is called an unreachable state.
Induction can be strengthened and generalized by considering longer prefixes in the proof of the step: the k-induction (\(k\geq0\)). In the base case, the model checker verifies that there exists no violating trace of length less than or equal to \(k+1\) for a given \(k\), and in the step, that, for any prefix of length \(k+1\) in which the property holds, the property also holds in the next states. The larger the \(k\), the more computing power is required from the model checker, but also the more properties it is able to prove.
The previous property that could not be proved with plain induction (0-induction) can be proved using 1-induction.
However, as for plain induction, \(k\)-induction is not a complete proof strategy, and some properties cannot be proved, whatever the value of \(k\) 6.
Methods have been developed to try to overcome this limitation of the induction principle, most of which rely on some kind of lemma generation strategy. A lemma is a relation that holds between variables of the system. It is thus similar in essence to a safety property, except that such a property is usually expressed only in terms of the inputs and outputs of the system whereas a lemma may also rely on internal variables, and a safety property characterizes some aspect of the safety of the system, whereas a lemma may be more general. Once proved 7, a lemma can be used as a known fact by the model checker to speed up its analyses and, most of all, to eliminate the unreachable states that prevent induction from succeeding.
Going back to the previous example, a simple lemma can be expressed saying that the two memories are opposite to one another. This lemma is easily proved (it is 0-inductive). Used in conjunction with the initial property, it eliminates the unreachable state from the exploration, which leads to a proof of the initial property.
A lemma that contributes to eliminate unreachable states preventing a property to be proved is called an induction strengthening lemma for the property.
A large number of lemmas can be automatically inferred either from the structure of the model, or more directed toward a specific non-inductive property. Lemmas can also be developed manually. Looking at the unreachable state found in a step-counter-example, together with the knowledge of the system design and especially of the principles ensuring its safety, it is usually rather simple to express a relation between variables of the model that eliminates this unreachable state. Iteratively eliminating unreachable states with lemmas obtained by manual investigation of successive step-counter-examples yields a powerful iterative methodology to easily reach the proof that a system respects its safety specification in a reasonable time.
Trustworthiness and Certification¶
When a formal safety verification solution is to be used as the mean to ensure the safety of a system in a trustful manner, for example because it is developed in a normative environment such as the EN 50128 or DO 178C, extra care is required in designing the architecture of the solution and its usage inside of the overall process in order to reach certification 8 of this process. This usually involves the use of techniques, processes, tools, tests, and independent reviews to assess:
the soundness of the system modeling ( it will not allow to prove properties that do not hold on the real system);
the correctness of the translations involved in the solution (translation from the actual system to its model, from this model to the model checker, and down to the underlying SAT solver);
the validity of the analyses performed by the model checker;
the completeness of the safety specification (the set of safety properties is sufficient to ensure the safety of the system);
the correctness of the safety specification (the expressions of the properties and constraints are correct, in particular the hypotheses cast upon the environment do not over-constrain the input scenarios).
It is also usually required that the safety specification shall not be developed by the design team, but by the validation team, or by some other independent team.
Contrarily to the safety specification, correctness and completeness of lemmas do not need to be assessed because they are always proved before being used. They are not part of the safety specification, just a help for the model checker. They can be developed in collaboration between the safety team and the design team. Having a badly expressed lemma will either prevent it from being proved, and thus used, or, if it is valid it might not play its role in the induction strengthening of a given property that will thus not be proved. Similarly, a missing lemma might prevent a property to be proved.
Using a formal safety verification solution brings numerous benefits to a development process:
Even before considering the verification activities, the solution is a very powerful debugger, giving a high momentum to a development process by dramatically shortening the generation / verification / correction cycles.
A formal safety verification is intrinsically complete, it will search for every possible falsifications, whereas human defined test campaigns and reviews are usually sample-based and incomplete.
It clearly identifies the complete list of assumptions ( the environment model) upon which the safety of the system relies.
A certified solution allows for a reduction of the testing and review efforts.
It is able to address the verification of the safety of a system taking into account failures, asynchronism, latencies, or other timing specificities between parts of a system that are usually beyond reach of traditional test-based verification solutions.
The use of formal verification in the verification of a system sends a strong and positive message to the market, and is sometime even a requirement for some customers.
The two following posts will expand on the different workflows and on the specific case of instanciated systems.
For commercial information and contact see the S3 product page.
This logical concept of safety is broader than the common definition used for example in safety critical systems.
Models are usually obtained by an automatic translation of a design or software, but they can also be created manually.
A system trace is the evolution of the system model on an infinite input scenario satisfying the hypotheses.
Only discrete time systems are considered, and the length is the number of time intervals.
The successor states are those that can be reached from the current state in one transition of the system.
It is possible to make the induction principle complete by imposing the unicity of states in the \(k+1\) prefix of the step-proof. However, in practice, this technique is hardly of any interest because of the size of the obtained formulas.
In any case, lemmas are always proved before being used.
In the rest of the document, the term “certification” will be used to denote either a certification required by a standard, or a mere trustworthiness requested on the formal safety verification solution.