OpenAI's Codex has been used to power a self-improving tax software system that demonstrates significant gains in accuracy and efficiency through an AI-driven feedback loop. The system, developed in partnership with Thrive Holdings and Crete Professionals Alliance, processed 7,000 tax returns across participating firms, achieving final drafts with up to 97% accuracy without corrections.
Background and Context
The development of self-improving AI agents is a growing trend in the industry, with OpenAI's Codex emerging as a major driver behind this transformation. The Codex is an AI system designed primarily for coding and coding assistance but is increasingly being leveraged as a more general enterprise AI system for working with data pipelines, application program interfaces (APIs), and workflow tools.
Real-world software often falters in unpredictable ways after deployment, requiring teams to spend weeks fixing bugs based on user feedback. However, by leveraging advanced agentic capabilities like those found in Codex, coupled with robust evaluation infrastructure and direct access to domain experts, it's now possible to build systems that self-improve.
How It Works
The self-improving tax software system powered by Codex employs a three-part loop: production traces are logged in detail, practitioner corrections become targeted eval datasets, and Codex is given a precise package of the failing trace, new eval set, full codebase, relevant skills, production data samples, expected tax-engine outputs, code examples, and eval-runner commands.
Codex diagnoses the root cause of errors, proposes concrete fixes, runs targeted and regression evals, and generates a pull request. Only clear, bounded improvements are auto-applied, while tricky edge cases are routed to engineers for review. The loop closes as each deployed fix creates fresh production data, fueling the next cycle.
This isn't generic fine-tuning – it's a living agent that gets better at the hardest parts of tax prep with every real return filed. The system has achieved impressive results in just six months, processing 7,000 tax returns across participating firms and freeing accountants to spend more time on high-value client advisory work instead of tedious data entry.
Why It Matters
The development of self-improving AI agents like the one powered by Codex has significant implications for industries that rely on complex workflows, such as tax preparation. By leveraging advanced agentic capabilities and robust evaluation infrastructure, these systems can improve accuracy and efficiency over time, reducing the need for manual intervention.
This technology also highlights the growing trend towards "self-improving" AI agents that are able to improve their own workflow through repeated use and feedback. As enterprise AI moves beyond basic automation and chatbot features into fully operationally autonomous systems, the potential applications of this technology extend far beyond tax preparation.
Key Facts
- The self-improving tax software system powered by Codex processed 7,000 tax returns across participating firms.
- Final drafts achieved up to 97% accuracy without corrections.
- Practitioners saved about one-third of their preparation time per return.
- Overall throughput increased by 50%.
- The system was developed in partnership with Thrive Holdings and Crete Professionals Alliance.
- Codex is an AI system designed primarily for coding and coding assistance but is increasingly being leveraged as a more general enterprise AI system.
What Comes Next
The development of self-improving AI agents like the one powered by Codex has significant implications for industries that rely on complex workflows. As this technology continues to evolve, we can expect to see applications in areas such as content moderation, age verification, and payment processing.
While the potential benefits of this technology are clear, there are also challenges to be addressed, including ensuring the security and integrity of these systems and addressing concerns around bias and fairness. As the industry continues to explore the possibilities of self-improving AI agents, it will be essential to prioritize transparency, accountability, and collaboration.