We use Reinforcement Learning, in particular a deep Q-Learning algorithm and an adaptation of two actor-critic algorithms (Proximal Policy Optimization and Phasic Policy Gradient), originally proposed for robotic control, to solve the metric Traveling Salesperson Problem (TSP). We introduce a convolutional model to approximate action-value and state-value functions, centered on the idea of considering a weighted incidence matrix as the agent’s graph representation at a given instant. Our computational experience shows that Q-Learning does not seem to be adequate to solve the TSP, but nevertheless we find that both PPO and PPG can achieve the same solution of standard optimization algorithms with a smaller computational effort; we also find that our trained models are able to orient themselves through new unseen graphs and with different costs distributions.

Heuristics for the Traveling Salesperson Problem based on Reinforcement Learning

Marta Monaci;
2021-01-01

Abstract

We use Reinforcement Learning, in particular a deep Q-Learning algorithm and an adaptation of two actor-critic algorithms (Proximal Policy Optimization and Phasic Policy Gradient), originally proposed for robotic control, to solve the metric Traveling Salesperson Problem (TSP). We introduce a convolutional model to approximate action-value and state-value functions, centered on the idea of considering a weighted incidence matrix as the agent’s graph representation at a given instant. Our computational experience shows that Q-Learning does not seem to be adequate to solve the TSP, but nevertheless we find that both PPO and PPG can achieve the same solution of standard optimization algorithms with a smaller computational effort; we also find that our trained models are able to orient themselves through new unseen graphs and with different costs distributions.
2021
traveling salesperson problem
reinforcement learning
heuristic
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12606/28661
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
social impact