1 story · sorted newest first · 📡 RSS
Nathan Habib and Pedro Cuenca introduce a new benchmarking tool, agent-eval, to measure the performance of coding agents using the