Understanding Formal Reasoning Failures in LLMs as Abstract Interpreters

Published in Workshop on Language Models and Programming Languages at SPLASH (LMPL), 2025

Recommended citation: Mitchell, J., Kim, B. H., Zhou, C., & Wang, C. (2025, October). Understanding Formal Reasoning Failures in LLMs as Abstract Interpreters. In Proceedings of the 1st ACM SIGPLAN International Workshop on Language Models and Programming Languages (pp. 71-83). https://dl.acm.org/doi/10.1145/3759425.3763389

Large language models (LLMs) are increasingly used for program verification, and yet little is known about how they reason about program semantics during this process. In this work, we focus on abstract interpretation based-reasoning for invariant generation and introduce two novel prompting strategies that aim to elicit such reasoning from LLMs. We evaluate these strategies across several state-of-the-art LLMs on 22 programs from the SV-COMP benchmark suite widely used in software verification. We analyze both the soundness of the generated invariants and the key thematic patterns in the models’ reasoning errors. This work aims to highlight new research opportunities at the intersection of LLMs and program verification for applying LLMs to verification tasks and advancing their reasoning capabilities in this application.

Links: proceedings | arXiv

Share on

Twitter Facebook LinkedIn

Brian Hyeongseok Kim

Share on