PULSE: 100x bandwidth reduction makes distributed RL training practical over commodity internet
Paper: https://arxiv.org/abs/2602.03839 We built a system that enables distributed RL training over commodity internet connections. Weight synchronization drops from 14 GB to approximately 108 MB per update for a 7B model, completely lossless. Distributed RL separates training from inference. Training nodes remain centralized with fast interconnects, but inference nodes need fresh weights delivered over whatever network they have. For large models, this weight transfer becomes the bottleneck. Transferring 14 GB every few steps over commodity internet means waiting, […]