A recent workshop at Stanford-MIT-Princeton highlighted the need for better third-party evaluations of AI systems, sparking a renewed focus on trustworthy assessments in the industry.

What Happened

On October 28, 2024, researchers from Stanford HAI's Center for Research on Foundation Models, Massachusetts Institute of Technology (MIT), Princeton’s Center for Information Technology Policy, and Humane Intelligence convened a virtual workshop to discuss the importance of third-party AI evaluations. The event brought together leaders from academia, industry, civil society, and government to articulate a vision for trustworthy assessments.

The workshop explored three key areas: evaluations in practice, evaluations by design, and evaluation law and policy. Keynote speaker Rumman Chowdhury, CEO at Humane Intelligence, emphasized the need for independent oversight, comparing the current state of AI development to a new Gilded Age characterized by major economic disruption and a lack of protections for users and citizens.

Background and Context

The use of general-purpose AI systems has become ubiquitous, with millions of people worldwide relying on tools like ChatGPT, Claude, and Stable Diffusion. While these systems offer significant benefits, they also pose serious risks, including the production of non-consensual intimate imagery, facilitation of bioweapon development, and contribution to biased decisions.

Third-party AI evaluations are crucial for assessing these risks, as they provide an independent perspective that is not influenced by company interests. However, the software security space has developed reporting infrastructure, legal protections, and incentives (such as bug bounties) to encourage third-party evaluation, but this is not yet the case for general-purpose AI systems.

Why It Matters to the Industry

The lack of trustworthy assessments in the AI industry can have significant consequences. Without independent oversight, companies may be able to manipulate evaluations to suit their interests, leading to biased results and a lack of accountability. This can result in the proliferation of harmful AI systems that pose risks to users and society as a whole.

For adult-industry platforms and operators, trustworthy assessments are particularly important due to the sensitive nature of the content they host. The ability to verify the safety and efficacy of AI-powered moderation tools, for example, is critical in ensuring compliance with regulations and protecting user privacy.

What Comes Next

In response to the workshop's findings, OpenAI has published a framework for trustworthy third-party evaluations, outlining how external assessors should test frontier models for capability, safeguards, and the validity of their results. The document emphasizes the importance of defining model versions, task settings, scoring methods, and conditions under which results were collected, in order to make evaluations repeatable, auditable, and hard to game.

The publication of this framework marks an important step towards establishing a common standard for third-party AI evaluations. However, it remains to be seen whether labs and independent evaluators will converge on the same procedures, and whether comparisons across vendors will become cleaner as a result.

Key Facts

  • The workshop at Stanford-MIT-Princeton highlighted the need for better third-party evaluations of AI systems.
  • Rumman Chowdhury, CEO at Humane Intelligence, emphasized the importance of independent oversight in AI development.
  • OpenAI has published a framework for trustworthy third-party evaluations, outlining how external assessors should test frontier models.
  • The document emphasizes the need to define model versions, task settings, scoring methods, and conditions under which results were collected.
  • The publication of this framework marks an important step towards establishing a common standard for third-party AI evaluations.