Superconducting-qubit control is fundamentally constrained by decoherence, finite bandwidth, and hardware-limited drive amplitudes, making high-fidelity state preparation sensitive to optimizer initialization under non-convex open-system dynamics. We propose a hybrid reinforcement learning (RL)–quantum optimal control (QOC) pipeline in which a lightweight, tabular, model-free RL agent is trained offline in simulation to generate feasible, bounded seed pulses, which are subsequently refined via GRAPE under Lindblad dynamics. Hard amplitude constraints are enforced consistently across both stages, ensuring strict feasibility throughout optimization. Performance is evaluated using a budget-matched protocol based on fidelity evaluations (F-evals), enabling controlled comparison with random-start multi-start GRAPE. On a transmon-like qubit benchmark with relaxation and dephasing, RL warm-starting reduces the median online refinement effort in the adopted finite-difference GRAPE implementation from 7568 to 3543 F-evals (2.14× reduction) while achieving terminal state fidelity ≥0.995 under identical constraints and evaluation budgets. We provide a theoretical interpretation of the improvement in terms of basin-of-attraction probability shaping in constrained control landscapes and an amortized cost analysis showing that the offline RL cost is recovered after a small number of reuse cycles. The results support the view that learning-based initialization can improve warm-start quality relative to uninformed feasible multi-start in constrained open-system quantum-control benchmarks, while broader practical comparison against stronger physics-guided seeds remains for future work.
Reinforcement Learning Based Warm Initialization for Constrained Open-System Quantum Optimal Control: A Controlled Budget-Matched RL-GRAPE Benchmark
Ricciardi Celsi, Lorenzo
2026-01-01
Abstract
Superconducting-qubit control is fundamentally constrained by decoherence, finite bandwidth, and hardware-limited drive amplitudes, making high-fidelity state preparation sensitive to optimizer initialization under non-convex open-system dynamics. We propose a hybrid reinforcement learning (RL)–quantum optimal control (QOC) pipeline in which a lightweight, tabular, model-free RL agent is trained offline in simulation to generate feasible, bounded seed pulses, which are subsequently refined via GRAPE under Lindblad dynamics. Hard amplitude constraints are enforced consistently across both stages, ensuring strict feasibility throughout optimization. Performance is evaluated using a budget-matched protocol based on fidelity evaluations (F-evals), enabling controlled comparison with random-start multi-start GRAPE. On a transmon-like qubit benchmark with relaxation and dephasing, RL warm-starting reduces the median online refinement effort in the adopted finite-difference GRAPE implementation from 7568 to 3543 F-evals (2.14× reduction) while achieving terminal state fidelity ≥0.995 under identical constraints and evaluation budgets. We provide a theoretical interpretation of the improvement in terms of basin-of-attraction probability shaping in constrained control landscapes and an amortized cost analysis showing that the offline RL cost is recovered after a small number of reuse cycles. The results support the view that learning-based initialization can improve warm-start quality relative to uninformed feasible multi-start in constrained open-system quantum-control benchmarks, while broader practical comparison against stronger physics-guided seeds remains for future work.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

