Person

UC Berkeley

1 story · sorted newest first · 📡 RSS

New research using MAST identifies why LLMs like Gemini-3-Flash, Kimi-K2, and GPT-OSS-120B fail in real-world IT automation tasks.