Reproducible DQN / Double DQN / Dueling comparison with diagnostics and generalization tests (LunarLander-v3)
I wanted to compare Vanilla DQN, DDQN and Dueling DDQN beyond just final reward, so I built a structured training and evaluation setup around LunarLander-v3. Instead of tracking only episode return, I monitored: • activation and gradient distributions • update-to-data ratios for optimizer diagnostics • action gap and Q-value dynamics • win rate with 95% CI intervals • generalization via human-prefix rollouts The strongest model (<9k params) achieves 98.4% win rate (±0.24%, 95% CI) across 10k seeds. The […]