Reinforcement Learning Methods for Neighborhood Selection in Local Search
Reinforcement learning has recently gained traction as a means to improve combinatorial optimization methods, yet its effectiveness within local search metaheuristics specifically remains comparatively underexamined. In this study, we evaluate a range of reinforcement learning-based neighborhood selection strategies — multi-armed bandits (upper confidence bound, $ε$-greedy) and deep reinforcement learning methods (proximal policy optimization, double deep $Q$-network) — and compare them against multiple baselines across three different problems: the traveling salesman problem, the pickup and delivery problem with time […]