Self-Improving World Modelling with Latent Actions
arXiv:2602.06130v1 Announce Type: new Abstract: Internal modelling of the world — predicting transitions between previous states $X$ and next states $Y$ under actions $Z$ — is essential to reasoning and planning for LLMs and VLMs. Learning such models typically requires costly action-labelled trajectories. We propose SWIRL, a self-improvement framework that learns from state-only sequences by treating actions as a latent variable and alternating between Forward World Modelling (FWM) $P_theta(Y|X,Z)$ and an Inverse Dynamics Modelling (IDM) $Q_phi(Z|X,Y)$. SWIRL iterates […]