PAC Guarantees for Reinforcement Learning: Sample Complexity, Coverage, and Structure
arXiv:2603.01309v1 Announce Type: cross Abstract: When data is scarce or mistakes are costly, average-case metrics fall short. What a practitioner needs is a guarantee: with probability at least $1-delta$, the learned policy is $varepsilon$-close to optimal after $N$ episodes. This is the PAC promise, and between 2018 and 2025 the RL theory community made striking progress on when such promises can be kept. We survey that progress. Our organizing tool is the Coverage-Structure-Objective (CSO) framework, proposed here, which […]