Can Assurance Help Build AI Systems That We Can Trust?

/
Can Assurance Help Build AI Systems That We Can Trust?
Key Insights from the AI Standards Hub Global Summit in Glasgow
AI systems are being deployed at scale across every sector of the economy. But as AI systems become more capable, and are deployed and used in more high-stakes contexts, we need to be sure that they will do what they are supposed to do, safely and reliably. In order for AI to deliver on its potential, and do so responsibly, we need to develop comprehensive infrastructure to create standards, evaluation frameworks, and independent oversight mechanisms that allow us to verify these systems are safe, reliable, and accountable.
Last week Partnership on AI partnered with The Alan Turing Institute, the British Standards Institution, the UK’s National Physical Laboratory (NPL), and four other summit partners to co-host the AI Standards Hub Global Summit in Glasgow. We joined AI assurance experts, researchers, policymakers, and practitioners to discuss how we build the assurance infrastructure that promotes the development of high-quality, safe AI systems, and empowers both citizens and enterprises to adopt them with calibrated trust: a clear-eyed understanding of AI’s capabilities and its limitations..
We co-hosted a workshop on AI assurance with the UK’s National Physical Laboratory, building on NPL’s recently announced Centre for AI Measurement and PAI’s recently released papers on Strengthening the AI Assurance Ecosystem. Four key themes emerged from our conversations.
The author, Jacob Pratt (left) in a workshop panel discussion on AI Assurance at the AI Standards Hub Summit in Glasgow
Assurance can’t stop at deployment
Assurance at each level of the AI value chain helps to build justified trust in AI systems, ensuring that they are both trusted and trustworthy. Yet we heard from AI assurers that the majority of assurance activity has focused on evaluating systems before they are deployed. Post-deployment monitoring, despite being a foundational requirement for any credible assurance framework, remains the least requested assurance service in the ecosystem. As people trust AI agents to take more real-world actions, failures in planning, tool-use and execution may go unseen, so the need for real-time failure detection and ongoing assurance is especially crucial.
Deployers aren’t engaging independent assurers enough. That needs to change
Demand for external/independent assurance services is currently low. This isn’t because deployers do not care about risk but because the ecosystem has yet to mature. In our recently released paper Demand and Incentives for External AI Assurance, we’ve mapped the reasons why demand has stalled: a lack of clear regulatory expectations, limited awareness of what independent assurance can offer, concerns about exposing proprietary systems, and limited knowledge of the risks of emerging systems.
We’ve also mapped out policy levers that can increase demand, and so we asked workshop attendees what actions they see as the most promising. Developing legislation was the most popular option, chosen by 46% of 76 respondents, with greater transparency through use case registers and incident reporting mechanisms coming in a close second at 41%. Promoting insurance and providing legal safe harbor for assured systems were less popular choices, perhaps reflecting that these mechanisms are less well understood, or perceived as less effective. However, we view all these initiatives as worth exploring further and educating the field on.
Frontier model risks demand state of the art evaluation standards
External assurance is perhaps most critical for frontier AI models. The irreversible, high-stakes nature of frontier model risks, such as chemical, biological, radiological, or nuclear (CBRN) risks, demands pre-deployment evaluations conducted by trusted, independent assurers. Just as cybersecurity defenses must continuously evolve to keep pace with emerging threats, AI evaluations must advance alongside rapidly shifting model capabilities to remain useful.
Standards will be crucial in enabling this, but they need to be adaptable. Rigid and prescriptive standards risk becoming obsolete as the technology moves forward. At the Summit, we heard how process standards that establish how rigorous evaluations are conducted may provide this adaptability, though this will need to be paired with trusted assurers with the authority to communicate their expert judgements. We explore how to build justified trust in assurers in our latest paper.
“Just as cybersecurity defenses must continuously evolve to keep pace with emerging threats, AI evaluations must advance alongside rapidly shifting model capabilities to remain useful.”
Agentic systems are outpacing standards that should govern them
Coding agents, autonomous task completion systems, and other agentic tools are being adopted faster than standard frameworks can adapt, with formal ISO standardization of AI agent guidance still in the early roadmapping stage. The rush to adopt these systems is quicker than official standardization can keep up, which is why voluntary frameworks, like those developed by multistakeholder organizations like ours, are essential to filling that gap. This is what we are advancing with our work in Prioritizing Real-Time Failure Detection in AI Agents.
At the Summit, we heard people calling for actions that speed up the development of standards, build out the supporting assurance infrastructure, and increase demand for external assurance along the AI system lifecycle. Glasgow was a valuable catalyst, but challenges remain: 52% of workshop attendees identified low demand for assurance as one of the biggest gaps in the ecosystem, driven largely by a lack of market incentives. Given the scale of AI’s impacts today, closing that gap requires fast, coordinated action between policymakers, standards developers, assurers, industry, academia and civil society – before further systems reach the public without adequate assurance that they are safe and effective.
PAI is committed to building a strong AI Assurance Ecosystem. We believe this will reduce harms to consumers, foster transparent and efficient markets, and ultimately drive greater adoption of high-quality AI systems that work for people and society. As a multistakeholder convener, we connect experts from across the ecosystem to develop guidance and recommendations that inform the responsible development and deployment of AI. Our role is especially crucial in this fragmented, evolving area. To learn more and keep up with our work in this space, sign up for our newsletter.

Last week, the Partnership on AI (PAI) co-hosted the AI Standards Hub Global Summit in Glasgow alongside several other organizations. The goal was to discuss building an infrastructure for assuring AI systems are safe, reliable, and accountable as they are being deployed at scale across various sectors of the economy. Four key themes emerged from discussions:
1. Assurance is needed not only during deployment but also post-deployment for real-time failure detection and ongoing assurance, especially given the increasing trust people place in AI agents to take real-world actions.
2. Demand for external/independent assurance services is currently low due to reasons such as a lack of clear regulatory expectations, limited awareness, concerns about proprietary systems, and limited knowledge of emerging risks.
3. Standards will be crucial in enabling the advancement of AI evaluations alongside rapidly shifting model capabilities, but they need to be adaptable to avoid becoming obsolete.
4. Agentic systems are being adopted faster than standard frameworks can adapt, necessitating the use of voluntary frameworks like those developed by PAI.
PAI is committed to building a strong AI Assurance Ecosystem to reduce harms, foster transparent markets, and drive greater adoption of high-quality AI systems.

Can Assurance Help Build AI Systems That We Can Trust?

Facts Only

Executive Summary

Full Take

Sentinel — Human