Point-based Value Iteration for Neuro-Symbolic POMDPs

[YSN+23] Rui Yan, Gabriel Santos, Gethin Norman, David Parker, Marta Kwiatkowska. Point-based Value Iteration for Neuro-Symbolic POMDPs. Technical report arXiv:2306.17639, arXiv. Paper under submission. 2023. [pdf] [bib] https://arxiv.org/abs/2306.17639

Downloads:

pdf (2.34 MB) $bib$ bib

Links: [Google] [Google Scholar] [CiteSeer]

Available from: https://arxiv.org/abs/2306.17639

Abstract. Neuro-symbolic artificial intelligence is an emerging area that combines traditional symbolic techniques with neural networks. In this paper, we consider its application to sequential decision making under uncertainty. We introduce neuro-symbolic partially observable Markov decision processes (NS-POMDPs), which model an agent that perceives a continuous-state environment using a neural network and makes decisions symbolically, and study the problem of optimising discounted cumulative rewards. This requires functions over continuous-state beliefs, for which we propose a novel piecewise linear and convex representation (P-PWLC) in terms of polyhedra covering the continuous-state space and value vectors, and extend Bellman backups to this representation. We prove the convexity and continuity of value functions and present two value iteration algorithms that ensure finite representability by exploiting the underlying structure of the continuous-state model and the neural perception mechanism. The first is a classical (exact) value iteration algorithm extending α-functions of Porta et al (2006) to the P-PWLC representation for continuous-state spaces. The second is a point-based (approximate) method called NS-HSVI, which uses the P-PWLC representation and belief-value induced functions to approximate value functions from below and above for two types of beliefs, particle-based and region-based. Using a prototype implementation, we show the practical applicability of our approach on two case studies that employ (trained) ReLU neural networks as perception functions, dynamic car parking and an aircraft collision avoidance system, by synthesising (approximately) optimal strategies. An experimental comparison with the finite-state POMDP solver SARSOP demonstrates that NS-HSVI is more robust to particle disturbances.