As computing systems increasingly involve concurrently acting autonomous agents, game-theoretic approaches are becoming widespread in computer science as a faithful modelling abstraction. These techniques can be used to reason about the competitive or collaborative behaviour of multiple rational agents or entities with distinct goals or objectives. We have developed a comprehensive set of techniques for verification and strategy synthesis for concurrent stochastic multi-agent games, covering finite and infinite horizon and a large class of probabilistic and reward objectives, as supported by the PRISM-games 3.0 software release.
Much of the work concerns zero-sum objectives, where a coalition of agents is aiming to maximise their expected reward, while the other agents aim to minimise this value. However, there are too limiting, as in many cases the agents will often have distinct, but not directly opposing goals, which cannot be modelled in a zero-sum fashion. We thus introduce equilibria, defined by a separate, independent objective for each agent. These are particularly attractive since they ensure stability against deviations by individual agents, improving the overall system outcomes. We consider Nash and correlated equilibria, and develop algorithms for their (approximate/exact) verification and synthesis for concurrent stochastic games. In a Nash equilibrium no agent has an incentive to deviate unilaterally from their strategy. Correlated equilibria, in which agents are able to coordinate through public signals, are easier to compute than Nash and can yield better outcomes.
The image shows possible equilibria for a parking game. We consider a cost measure based on the distance each car needs to travel before parking, and if there is a crash. Although both strategies result in equilibria, given the cars are able to park without crashing, there is an advantage is selecting the one shown on the right, as the sum of distances travelled by all cars is smaller.
To select amongst the multiple equilibria, we consider two main types of optimality criteria, social welfare and social fairness. Social welfare maximises the sum of the agents’ rewards, whereas social cost minimises this value. On the other hand, social fairness, which minimises the differences between the objectives of individual players, is a novel optimality criterion inspired from economics that is distinct from the use of fairness in verification. Both optimality criteria can be employed for Nash and correlated equilibria.
The image shows optimal values for agents trying to send packets using the slotted Aloha protocol. In this setting, if k users try to send a packet in the same time slot, the probability that each of them being successful is q/k, where q is a value between 0 and 1. If unsuccessful, an agent needs to wait a number of slots before resending set according to the Aloha’s exponential backoff scheme. In the figure, SW_{i} correspond to the optimal values (expected times to send their packets) for agent i for both SWNE (social welfare Nash equilibria) and SWCE (social welfare correlated equilibria) for the cases of two, three and four users. We see that for these types of equilibria the values are different for each user, being lower for the agent that sends first and higher for that who sends last. On the other hand, for SFNE (social fair Nash equilibria) and SFCE (social fair correlated equilibria) the values for all agents coincide while for the latter the overall sum is only than 2% smaller when comparing to the social-welfare optimal variants.
We are developing neuro-symbolic concurrent stochastic games, in which agents operate concurrently in a shared environment that they observe through perception modules implemented as machine learning components, and specifically neural networks. We are working on verification and strategy synthesis for (discounted, infinite-horizon) zero-zum properties for this variant of concurrent stochastic games. We have also studied (undiscounted, finite-horizon) equilibria synthesis problem, which we have applied to the VCAS autonomous aircraft controller (also studied in the Robustness Guarantees for Bayesian Neural Networks theme) implemented as a ReLU network.
The figure plots the altitude h for equilibria and zero-sum strategies when maximising h for a given instant k. It can be seen that, with respect to the safety criterion of avoiding a near mid-air collision, equilibria strategies allow the two aircraft to reach a safe configuration within a shorter horizon, which would be missed by a zero-sum analysis.
To know more about these models and analysis techniques, follow the links below.