Multimodal Supervisory Graphs for PersistentWorld Modeling in Generative AI
Generative models have achieved remarkable success in producing realistic images and short video clips, but existing approaches struggle to maintain *persistent worldcoherence over long durations and across multiple modalities. We propose Multimodal Supervisory Graphs (MSG), a novel framework for world modeling that unifies geometry (3D structure), identity (consistent entities), physics (dynamic behavior), and interaction (user/agent inputs) in a single abstract representation. MSG represents the environment as a dynamic latent graph, factorized by these four aspects and trained with […]