Differential Voting: Loss Functions For Axiomatically Diverse Aggregation of Heterogeneous Preferences
arXiv:2601.18824v1 Announce Type: new Abstract: Reinforcement learning from human feedback (RLHF) implicitly aggregates heterogeneous human preferences into a single utility function, even though the underlying utilities of the participants are in practice diverse. Hence, RLHF can be viewed as a form of voting, where the aggregation mechanism is defined by the loss function. Although Arrow’s Impossibility Theorem suggests that different mechanisms satisfy different sets of desirable axioms, most existing methods rely on a single aggregation principle, typically the […]