Person

IBM Research team

1 story · sorted newest first · 📡 RSS

VAKRA, a new tool-grounded executable benchmark from IBM Research, evaluates AI agents' ability to reason end-to-end in complex, m