Reinforcement Learning Based Warm Initialization for Constrained Open-System Quantum Optimal Control: A Controlled Budget-Matched RL-GRAPE Benchmark

IRIS

Superconducting-qubit control is fundamentally constrained by decoherence, finite bandwidth, and hardware-limited drive amplitudes, making high-fidelity state preparation sensitive to optimizer initialization under non-convex open-system dynamics. We propose a hybrid reinforcement learning (RL)–quantum optimal control (QOC) pipeline in which a lightweight, tabular, model-free RL agent is trained offline in simulation to generate feasible, bounded seed pulses, which are subsequently refined via GRAPE under Lindblad dynamics. Hard amplitude constraints are enforced consistently across both stages, ensuring strict feasibility throughout optimization. Performance is evaluated using a budget-matched protocol based on fidelity evaluations (F-evals), enabling controlled comparison with random-start multi-start GRAPE. On a transmon-like qubit benchmark with relaxation and dephasing, RL warm-starting reduces the median online refinement effort in the adopted finite-difference GRAPE implementation from 7568 to 3543 F-evals (2.14× reduction) while achieving terminal state fidelity ≥0.995 under identical constraints and evaluation budgets. We provide a theoretical interpretation of the improvement in terms of basin-of-attraction probability shaping in constrained control landscapes and an amortized cost analysis showing that the offline RL cost is recovered after a small number of reuse cycles. The results support the view that learning-based initialization can improve warm-start quality relative to uninformed feasible multi-start in constrained open-system quantum-control benchmarks, while broader practical comparison against stronger physics-guided seeds remains for future work.

Reinforcement Learning Based Warm Initialization for Constrained Open-System Quantum Optimal Control: A Controlled Budget-Matched RL-GRAPE Benchmark

Gabriele, Daniele;Ricciardi Celsi, Lorenzo

2026-01-01

Abstract

Superconducting-qubit control is fundamentally constrained by decoherence, finite bandwidth, and hardware-limited drive amplitudes, making high-fidelity state preparation sensitive to optimizer initialization under non-convex open-system dynamics. We propose a hybrid reinforcement learning (RL)–quantum optimal control (QOC) pipeline in which a lightweight, tabular, model-free RL agent is trained offline in simulation to generate feasible, bounded seed pulses, which are subsequently refined via GRAPE under Lindblad dynamics. Hard amplitude constraints are enforced consistently across both stages, ensuring strict feasibility throughout optimization. Performance is evaluated using a budget-matched protocol based on fidelity evaluations (F-evals), enabling controlled comparison with random-start multi-start GRAPE. On a transmon-like qubit benchmark with relaxation and dephasing, RL warm-starting reduces the median online refinement effort in the adopted finite-difference GRAPE implementation from 7568 to 3543 F-evals (2.14× reduction) while achieving terminal state fidelity ≥0.995 under identical constraints and evaluation budgets. We provide a theoretical interpretation of the improvement in terms of basin-of-attraction probability shaping in constrained control landscapes and an amortized cost analysis showing that the offline RL cost is recovered after a small number of reuse cycles. The results support the view that learning-based initialization can improve warm-start quality relative to uninformed feasible multi-start in constrained open-system quantum-control benchmarks, while broader practical comparison against stronger physics-guided seeds remains for future work.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Parole chiave
	
				GRAPE
hybrid optimization
open quantum systems
quantum optimal control
reinforcement learning
superconducting qubits
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12606/47606

Citazioni

ND

0

social impact