VerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots

Danil S. Grigorev^1,2, Alexey K. Kovalev^1,3, Aleksandr I. Panov^1,3

¹MIPT, Dolgoprudny, Russia, ²Pyatigorsk State University, Russia, ³AIRI, Moscow, Russia

Abstract

In the field of robotics, researchers face a critical challenge in ensuring reliable and efficient task planning. Modern planning systems generate action sequences that appear correct at first glance but contain hidden errors that only become evident during execution. We propose VerifyLLM, a novel framework that combines Large Language Models (LLMs) with Linear Temporal Logic (LTL) for systematic pre-execution verification of robotic task plans. Our approach consists of two key steps: first, the conversion of natural language instructions into LTL formulas, followed by comprehensive analysis of action sequences using LLM reasoning capabilities enhanced by formal constraints. The system identifies three critical types of plan inconsistencies: position errors, missing prerequisites, and redundant actions. Rigorous testing on datasets of varying complexity demonstrates significant improvements in plan reliability across diverse household scenarios.

Method

VerifyLLM employs a two-stage architecture that combines formal logic with natural language understanding. The Translation Module converts task descriptions into Linear Temporal Logic formulas using few-shot prompting, capturing temporal dependencies and logical constraints. The Verification Module analyzes action sequences using a sliding window approach (optimal size: 5 actions), identifying and correcting three critical error types through LLM reasoning guided by formal LTL specifications.

Experimental Results

We evaluated our approach on the VirtualHome dataset with 71 household task instructions across different language models:

Method	LCS Similarity	Missing Actions	Extra Actions	Order Errors
Baseline (Llama-3.2-1B)	0.0717	10.28	9.14	16.48
CoT Optimizer	0.0705	10.38	9.35	16.80
VerifyLLM (Llama)	0.0982	11.18	9.13	15.12
VerifyLLM (Claude)	0.183	10.17	8.32	9.47

VerifyLLM achieves significant improvements: 40% reduction in order errors and 2.6x better LCS similarity compared to baseline methods. The system successfully identifies and corrects the three main types of plan errors across diverse scenarios.

@article{grigorev2024verifyllm, title={VerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots}, author={Grigorev, Danil S. and Kovalev, Alexey K. and Panov, Aleksandr I.}, journal={arXiv preprint arXiv:XXXX.XXXXX}, year={2024} }

VerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots

VerifyLLM combines Large Language Models with Linear Temporal Logic for systematic pre-execution verification of robotic task plans.

Abstract

Method

Example: Give Milk to Cat Task

Demo

Experimental Results

BibTeX