1 story · sorted newest first · 📡 RSS
Researchers at Dharma AI publish paper on Direct Preference Optimization (DPO), a new method for fine-tuning language models that