Beyond Next-Token Prediction: A Standards-Aligned Survey of Autoregressive LLM Failure Modes, Deployment Patterns, and the Potential Role of World Models

IRIS

This paper is a focused, standards-aligned survey of where autoregressive (AR) large language models (LLMs) tend to break down when deployed inside industrial informatics workflows that must satisfy long-horizon objectives, hard constraints, traceability, and functional-safety obligations (e.g., IEC 61508/ISO 26262/ISO 21448). Rather than claiming new algorithms or experiments, we synthesize and organize prior work into (i) a control-oriented taxonomy of four AR failure modes that recur in practice (compounding error, myopic objectives, data brittleness/hallucinations, and scaling/latency inefficiencies), (ii) a catalog of standards-compatible deployment patterns that mitigate these issues (human-gated LLM-in-the-loop, retrieval + verification pipelines, planner-of-record architectures, and runtime assurance envelopes), and (iii) an operational decision framework (criteria table with observable proxies, a stepwise decision procedure, and worked examples) for deciding when token-centric mitigations are sufficient versus when state/world-model components become warranted. Joint Embedding Predictive Architectures (JEPA) and Hierarchical JEPA (H-JEPA) JEPA are proposed as representative state-predictive architectures, with discussion explicitly bounded by currently available empirical evidence; we explicitly note that the published evidence base is currently concentrated on vision/multimodal benchmarks and that industrial control validation remains limited. To make evidence boundaries transparent, we introduce (a) a survey method (scope, inclusion/exclusion criteria, and data-extraction fields), (b) a comparison matrix across representative prior systems, and (c) an evidence map that links each deployment pattern to peer-reviewed empirical findings and system reports.

Beyond Next-Token Prediction: A Standards-Aligned Survey of Autoregressive LLM Failure Modes, Deployment Patterns, and the Potential Role of World Models

Ricciardi Celsi, Lorenzo;McCann, James

2026-01-01

Abstract

This paper is a focused, standards-aligned survey of where autoregressive (AR) large language models (LLMs) tend to break down when deployed inside industrial informatics workflows that must satisfy long-horizon objectives, hard constraints, traceability, and functional-safety obligations (e.g., IEC 61508/ISO 26262/ISO 21448). Rather than claiming new algorithms or experiments, we synthesize and organize prior work into (i) a control-oriented taxonomy of four AR failure modes that recur in practice (compounding error, myopic objectives, data brittleness/hallucinations, and scaling/latency inefficiencies), (ii) a catalog of standards-compatible deployment patterns that mitigate these issues (human-gated LLM-in-the-loop, retrieval + verification pipelines, planner-of-record architectures, and runtime assurance envelopes), and (iii) an operational decision framework (criteria table with observable proxies, a stepwise decision procedure, and worked examples) for deciding when token-centric mitigations are sufficient versus when state/world-model components become warranted. Joint Embedding Predictive Architectures (JEPA) and Hierarchical JEPA (H-JEPA) JEPA are proposed as representative state-predictive architectures, with discussion explicitly bounded by currently available empirical evidence; we explicitly note that the published evidence base is currently concentrated on vision/multimodal benchmarks and that industrial control validation remains limited. To make evidence boundaries transparent, we introduce (a) a survey method (scope, inclusion/exclusion criteria, and data-extraction fields), (b) a comparison matrix across representative prior systems, and (c) an evidence map that links each deployment pattern to peer-reviewed empirical findings and system reports.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Parole chiave
	
				autoregressive language models
energy-based models
JEPA
rational AI
retrieval-augmented generation
safety-critical AI
verification and runtime assurance
world models
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12606/47625

Citazioni

ND

0

social impact