ORPO vs DPO vs KTO vs SimPO: which preference-optimization method should you choose in 2026?
SimPO removes the reference-model/log-ratio dependency and the SimPO README reports it can outperform DPO and its latest variants on AlpacaEval 2, MT-Bench, and Arena-Hard — but the gains are hyperparameter-sensitive, especially learning rate, beta, and gamma/beta tuning.