POLAR: A Pessimistic Model-based Policy Learning Algorithm for Dynamic Treatment Regimes
arXiv:2506.20406v2 Announce Type: replace Abstract: Dynamic treatment regimes (DTRs) provide a principled framework for optimizing sequential decision-making in domains where decisions must adapt over time in response to individual trajectories, such as healthcare, education, and digital interventions. However, existing statistical methods often rely on strong positivity assumptions and lack robustness under partial data coverage, while offline reinforcement learning approaches typically focus on average training performance, lack statistical guarantees, and require solving complex optimization problems. To address these challenges, […]