VerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots

Danil S. Grigorev1,2, Alexey K. Kovalev1,3, Aleksandr I. Panov1,3
1MIPT, Dolgoprudny, Russia, 2Pyatigorsk State University, Russia, 3AIRI, Moscow, Russia

VerifyLLM combines Large Language Models with Linear Temporal Logic for systematic pre-execution verification of robotic task plans.

Abstract

In the field of robotics, researchers face a critical challenge in ensuring reliable and efficient task planning. Modern planning systems generate action sequences that appear correct at first glance but contain hidden errors that only become evident during execution. We propose VerifyLLM, a novel framework that combines Large Language Models (LLMs) with Linear Temporal Logic (LTL) for systematic pre-execution verification of robotic task plans. Our approach consists of two key steps: first, the conversion of natural language instructions into LTL formulas, followed by comprehensive analysis of action sequences using LLM reasoning capabilities enhanced by formal constraints. The system identifies three critical types of plan inconsistencies: position errors, missing prerequisites, and redundant actions. Rigorous testing on datasets of varying complexity demonstrates significant improvements in plan reliability across diverse household scenarios.

Method

VerifyLLM employs a two-stage architecture that combines formal logic with natural language understanding. The Translation Module converts task descriptions into Linear Temporal Logic formulas using few-shot prompting, capturing temporal dependencies and logical constraints. The Verification Module analyzes action sequences using a sliding window approach (optimal size: 5 actions), identifying and correcting three critical error types through LLM reasoning guided by formal LTL specifications.

Example: Give Milk to Cat Task

Original Plan:

1. Walk to home office → 2. Walk to bedroom → 3. Walk to bedroom → 4. Walk to home office

Issues Detected by VerifyLLM:

  • Redundant actions: "Walk to bedroom" appears twice consecutively
  • Missing prerequisites: No actions for getting milk, preparing bowl, or feeding cat
  • Position errors: Actions not logically sequenced for the task

Optimized Plan (6 actions):

1. Walk to bedroom → 2. Walk to home office → 3. Pick up milk → 4. Walk to kitchen → 5. Pour milk in bowl → 6. Place bowl on floor

Demo

Experimental Results

We evaluated our approach on the VirtualHome dataset with 71 household task instructions across different language models:

Method LCS Similarity Missing Actions Extra Actions Order Errors
Baseline (Llama-3.2-1B) 0.0717 10.28 9.14 16.48
CoT Optimizer 0.0705 10.38 9.35 16.80
VerifyLLM (Llama) 0.0982 11.18 9.13 15.12
VerifyLLM (Claude) 0.183 10.17 8.32 9.47

VerifyLLM achieves significant improvements: 40% reduction in order errors and 2.6x better LCS similarity compared to baseline methods. The system successfully identifies and corrects the three main types of plan errors across diverse scenarios.

BibTeX

@article{grigorev2024verifyllm,
  title={VerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots},
  author={Grigorev, Danil S. and Kovalev, Alexey K. and Panov, Aleksandr I.},
  journal={arXiv preprint arXiv:XXXX.XXXXX},
  year={2024}
}