Multi-Agent Smart Solver (MASS)

Systerel introduces the latest member in Systerel Smart Solver family: Multi-Agent Smart Solver (MASS). MASS significantly reduces proof wall clock time for the user, using extensible computing clusters.

Systerel Smart Solver is a SAT based solution. After a short description of what SAT is, this article will comment results and benefits of using MASS, a new cluster-based deployment of Systerel Smart Solver model checker.

SAT, did you say SAT ? For industrial applications ?

SAT is a shortcut for Boolean SATisfiability problem, also called propositional SATisfiability problem.

Solving a SAT problem for a boolean formula means deciding if some assignment of its variables exists such as the formula isTRUE (the formula is satisfiable) or if, for any assignment, the formula always evaluate to FALSE (formula is unsatisfiable).

A SAT problem is an NP-complete decision problem. This means that there is no known algorithm as of today that can solve efficiently any SAT problem.

Systerel Smart Solver nonetheless uses the SAT technology, thanks to heuristics in the model checker that can tackle most of our customers industrial-grade problems, espacially in railway signaling domain. Our model checker is used for studying safety properties for complex and critical systems such as interlocking stations (urban and main line) or wayside CBTC Zone Controlers.

Systerel invests continuously in order to improve its model checker performances and offer best in class industry service. One key performance is the proof wall clock time for the user. Critical systems complexity keeps on increasing over time. The model checker keeps on improving to address bigger models. Moreover, the faster the proof is delivered, the greater it contributes to reducing commissioning plannings of critical systems.

It goes without saying that this performance challenge shall not compromise the safety demonstration.

The first direction consists of improving solvers performance while preserving determinism. SAT solvers are mathematically chaotic, hence determinism can bend the trade-off for slightly less performance. This is mandatory to preserve means to replay and analyze a given behaviour, i.e. offer an industrial-grade maintenance. As such, design decisions of Systerel Smart Solver take it very seriously. Our portfolio design orchestrating several solvers collaboration illustrates this trade-off.

Another direction deals with distributing the proof when groups of proof obligations can be analysed independantly. Multi-Agent Smart Solver (MASS) is our first step in this direction. MASS distributes analyses over several agents, each agent using several cores of its machine in order to run a portfolio.

In order to ease MASS adoption, Systerel considered the following constraints:

  • the solution shall be deployable on owned clusters
  • the solution shall be deployable on cloud rented clusters
  • the solution shall reduce proof duration from the user point of view

Promising first results

MASS 1.0 has been tested against real industrial problems. The conducted experiments bring signification information for the end-user:

  • the solution scales up seamlessly,
  • the expected acceleration is provided,
  • the solution can provide additional flexibility when customers are dealing with high server demand.

Before sharing results, just a few definitions of measured concepts:

  • Wall clock time: time measured by the user between the moment analyses begin and the end of the last analysis,
  • Total time: the sum of each analysis duration ; Total time matches Wall clock time for a single agent cluster,
  • Acceleration: the duration acceleration ratio compared a single agent run ; Acceleration=Total time/Wall clock time,
  • Cluster load: pool of agents usage ratio, i.e. Acceleration / n with n the count of agents in the cluster ; it measures distribution efficiency ; the closer to 1, the less idle time.

We first measured analyses in their original order, i.e. as they were given in their industrial example. This provided us with data to compute the Acceleration provided by MASS multiple agents compared to the mono machine original performance. From that original order, a improved order has been derived in order to confirm the possibility in the future to develop heuristics on the order to reach an optimal Acceleration. In order to do so, the results using this improved order have been compared to the original one.

We also compared the same analyses over clusters built of different agent types (Systerel HPC machines with 40 cores 320GB RAM vs some Cloud rented machines with 60 cores 240GB RAM).

Eventually, we also measured the influence of a new Reduction strategy added to Systerel Smart Solver.

Results follow:

Results

  • The original order leads to an 80%- Cluster load and an Acceleration of ~7.9 from the user perspective
  • The taylored order boosts cluster load up to 99.9%, providing a maximal Acceleration (i.e. equals to cluster agents count).
  • The new reduction stratgy applied to the original order reduced by 20% the Total time and boosts Cluster load up to 94%
  • Systerel HPC servers may be more time effective than the rented cluster but between both clusters, Wall clock time difference makes it acceptable to rent clusters either for solving punctual cluster demand, gaining flexibility or even to stop investing on owned clusters.

With this first release, analyses taking 12+ days can now be adressed in slightly more than 1 day.

Ressources

Considering resource consumption, the cluster can be expanded more in order to beat down Wall clock time. 20 agents instead of the 10 used would have significantly reduced the Wall clock time.

In a nutshell,

  • a run using the original order with MASS accelerates proof time by 8
  • the learning curve on smaller similar problems make it possible to significantly improve Acceleration
  • scaling up on rented clusters offers a credible extension / alternative of proof server infrastructure

Upcoming features

This first release of MASS offers solid progress. This is only a first step.

An additional feature of distribution with MASS compared to a single process is robustness againts agents failure due to a machine power down with shorter recovery and completion.

With some lean configuration work, our team can set up some automatic reporting (e.g. periodic mail progress report).

Our team can also help optimizing your analyses, break down their complexity to benefit from additional acceleration on larger coming runs.

MASS roadmap will provide progress HMIs. It includes learning features to automatically improve analyses ordering within similar systems analyses, as well as competition management between analyses (e.g. in order to experiment incompatibles strategies on the same analysis and stop all when one has been successfully completed).

For more information, please contact us.

Comments