Summary: Mechanisms to Verify International Agreements about AI Development

If world leaders agree to halt or limit AI development, they will need to verify that other nations are keeping their commitments. To this end, it helps to know where AI chips are, how they’re used, and what the AIs trained on them can do.
In this post, we informally summarize “Mechanisms to Verify International Agreements About AI Development”, written by the Technical Governance Team‘s Aaron Scher and Lisa Thiergart, and originally published in November 2024. For a more technical overview, we recommend the executive summary.
Here, we’ll cover three illustrative policy goals explored in the paper, along with potential verification methods for each. The three goals involve:
- Tracking the location of AI compute (chips).
- Verifying that tracked compute is not doing large-scale training.
- Certifying model evaluations.
We focus primarily on verification for international governance, and discuss some promising methods useful for a range of future verification needs.1
Goal One: Tracking AI compute
Modern AIs are trained using highly advanced and specialized chips, usually (but not necessarily) clustered in large datacenters for efficiency. International governance thus requires some way to track these high-end chips. There are low-tech and high-tech ways to do this.
Low tech: In-person inspections. International inspectors might physically visit datacenters (perhaps via an arrangement like the START nuclear weapons treaty) to count chips, examine chips for tampering, audit security, and set up cameras for continuous monitoring. These methods require physical access to the datacenters, and so a tighter and more detailed agreement among participants, but they could be implemented almost immediately with current technology.
High tech: Secure chip governance. Chips can be designed or modified to remotely confirm their location, or retrofitted with hardware that does this. For instance, regularly pinging multiple external servers and measuring the time delay between pings can approximate a chip’s general location on Earth. Regulators can require chip manufacturers to install such mechanisms, bypassing the need for ongoing physical access to datacenters. This method requires developing better tamper-proofing; current chip security measures aren’t designed to stop a well-resourced attacker from simply extracting a chip’s private key and using it to spoof the location.
It is recommended to primarily track chips rather than datacenters, as many datacenters don’t train AIs and the highly bottlenecked chip supply chain is easier to secure.
Goal Two: Verifying lack of large-scale training
The most capable AIs tend to be those trained at the largest scale, so training compute is a useful, though imperfect, proxy for capabilities in AI governance. Most chips that are used to run AI can also be used for training; if states choose to ban large-scale training runs, and someone insists “I’m only using these chips for inference,” states need a way to verify that claim.
Currently, very large datacenters are used to train frontier AIs. Physical inspections and continuous monitoring might provide confirmation that the quality and type of chips in a datacenter don’t permit large training runs. But an international agreement needs to be robust against attempts to cheat, and monitoring large datacenters incentivizes would-be cheaters to find ways to distribute training runs across many smaller datacenters. A better method would involve chips designed solely for inference rather than merely specialized for it.2 For now, the majority of CPUs and non-datacenter GPUs are not powerful enough to require monitoring, though states will eventually need to adjust what qualifies as AI compute as algorithms improve.
Once datacenters with the capacity for large training runs are identified, states need to verify that such training isn’t taking place. There are a few ways one might do so, for example:
- Require datacenters to log their chip activities for later auditing.
- Re-run some fraction of those activities on mutually trusted hardware.
- Classify workloads based on power draw, bandwidth, and other externally detectable metrics.
Logging and verifying chip workloads can provide strong evidence that the chips weren’t used for illegal training. But since the actual content of the training runs might be sensitive or secret, it helps to have verification options that don’t expose the code, weights, or training data.
Fortunately, large-scale training has characteristics that make it stand out. For example, it needs high interconnect bandwidth for communication among clusters of chips, and enforcing limits on this bandwidth is one way to limit large training runs. If granted temporary access to sample workloads from a given datacenter, verifiers can figure out which patterns of power draw and network activity correspond to innocuous chip use, making external monitoring much more effective.
Securing chips against tampering, even by well-resourced state actors, opens up additional options. For instance, the chips themselves could log and report their activity. In one proposed design, a second computer attached to the chip performs monitoring and other governance functions. The monitor itself would need to be secure, but with a less demanding workload, it could be easier to make than high-end AI chips.
A drawback of indirect methods is that they are difficult to make adversarially robust, especially given likely advances in algorithmic efficiency and distributed training. Secure chips and trusted compute infrastructure thus make good targets for proactive research.
Goal Three: Certifying model evaluations
Separately or in addition to limits on training, countries may want to evaluate each other’s AI models. The main difficulties here include:
- Securing the evaluation process.
- Evaluating the correct model.
- Running effective evaluations.
Evaluations might be made more secure by running only mutually approved code in Trusted Execution Environments (TEEs).3 For instance, a verifier might require that the model code and weights run in a TEE must be verifiably the same during evaluation as during training or deployment, which could catch attempts to sneak a weaker model into the eval.
While minimal versions of these techniques will likely be available soon, existing AI chips may not be sufficiently secure against well-resourced nation-state actors. Research into secure evaluations should begin before it’s imminently needed.
Even if security and model identification are solved, the science of evaluations is still in its early stages. As of 2026, AI capabilities are still outpacing evaluators’ ability to test them. If frontier AI progress is slowed by international agreement, evaluations might be able to catch up, but for now, evaluating the capacities of frontier AIs remains a difficult problem.
Other verification mechanisms
Some mechanisms could be useful for a wide range of policy goals. Whistleblower programs have been useful in reporting breaches of conduct in sensitive industries, and can help provide additional verification.4 With the right capabilities and scaffolding, AI-enabled methods could classify workloads (Goal Two) or conduct evaluations (Goal Three) without leaking sensitive data.
Another option is to require a safety case for AI deployments: a structured and well-evidenced argument that a particular application of AI is safe. Such requirements are common in mature industries such as airlines, oil and gas, and medical devices.
Many policy goals may require monitoring the behavior of every copy of a deployed AI. This is a difficult challenge that may nevertheless be doable if model weights can be secured from theft in a small number of datacenters.
It has become common practice for AI developers to train their AIs on a behavior specification. These “model specs” are far from reliable; despite instructions, AIs can still badly misbehave, and to date there are no defenses that can stop a determined attacker with access to model weights from bypassing specs. But to the extent that model specs can influence AI behavior at all, it makes sense to include agreed-upon principles like “don’t violate international agreements” and to look for better ways to make them stick.
Final thoughts
MIRI has gone to great lengths to communicate that building powerful artificial intelligence under current conditions most likely kills everyone on Earth. Preventing this outcome will likely require a concerted (but not unprecedented) diplomatic effort, backed by robust technical solutions.
If states are able to pull together and commit to an international agreement, it seems possible to set up a verification regime and avert global catastrophe. But verification will be far more secure and robust, with less costly tradeoffs, if we do the hard work of figuring out how to implement the most promising methods now. Some core challenges are the increasing pace of algorithmic progress and distributed training, the difficulty of classifying AI chip activities in an adversarial setting, and the novel threat landscape from highly capable AI systems. Proactive research into solving these challenges will better position us for international and domestic regulation alike.
Footnotes
- Many of these tools can also help domestic regulators enforce laws around AI development, but some will be overkill because domestic companies tend to be less adversarial and less willing to break laws.
- Since November 2024, there’s been a growing market for chips specialized for inference, like Sohu or chips from Groq. These would probably be inefficient at pre-training, if they work at all, but a determined state actor might be able to use them anyway. The specialized chips we’re proposing would need to be explicitly designed to prevent that.
- Existing TEE implementations include NVIDIA’s Confidential Computing or Apple’s Secure Enclave.
- Prearranged insider interviews may also be helpful.

Facts Only

* The Technical Governance Team’s Aaron Scher and Lisa Thiergart wrote an article published in November 2024.
* The article discusses mechanisms for verifying international agreements regarding AI development.
* The primary goals identified are tracking AI compute (chips), verifying the lack of large-scale training, and certifying model evaluations.
* Verification methods include physical inspections of datacenters, secure chip governance, logging chip activities, re-running workloads, and using Trusted Execution Environments (TEEs).
* Tracking AI chips is recommended over tracking datacenters due to the bottlenecked chip supply chain.
* Large-scale training is identified as a key proxy for AI capabilities, necessitating verification methods to prevent deceptive claims.
* Methods for verifying lack of large-scale training include logging chip activities, re-running workloads, and classifying workloads based on metrics.
* Certifying model evaluations involves securing the evaluation process through TEEs.
* The article highlights the challenges associated with evaluating advanced AI models and the potential for future difficulties.
* Whistleblower programs and safety case requirements are suggested as potential verification mechanisms.
* The article concludes with a call for proactive research into solving key challenges related to algorithmic progress and distributed training.

Executive Summary

The article, penned by the Technical Governance Team, examines the critical need for international oversight of AI development, specifically focusing on verifying compliance with potential agreements. It outlines three key policy goals: tracking AI compute, preventing large-scale training, and certifying model evaluations. Verification methods range from physical inspections of AI data centers – employing techniques similar to the START treaty – to utilizing secure chip designs with remote location verification and activity logging. The authors stress the importance of tracking chips directly due to their specialized nature and the concentrated supply chain. They acknowledge the difficulty in detecting deceptive claims about training activity, suggesting methods like workload logging and re-running activities for audit. Certifying model evaluations relies on Trusted Execution Environments (TEEs) to secure the evaluation process. The article acknowledges the rapidly evolving nature of AI and the challenges in accurately evaluating advanced models, particularly in the early stages of development. While acknowledging potential verification mechanisms like whistleblower programs and safety case requirements, the authors emphasize proactive research to address the increasing complexity of AI systems and distributed training methods. The piece ultimately argues for a concerted diplomatic effort backed by robust technical solutions to avert potential global catastrophe.

Full Take

The article presents a remarkably cautious and strategically layered approach to a problem that, frankly, feels like it’s still largely theoretical. Scher and Thiergart aren’t just laying out technical specs; they’re constructing a defensive architecture against a potential AI apocalypse, and the architecture is built on assumptions that, frankly, feel brittle. The focus on “chip governance” – particularly the push for remotely pinging chips – reveals a fundamental paranoia about algorithmic opacity and the potential for state actors to circumvent oversight. It’s a classic “assume the worst” strategy, reflecting a deeply ingrained distrust of powerful AI systems. The recurring emphasis on “inference” rather than training underscores a crucial tension: the ability to verify *what* is being done with compute is fundamentally more difficult than simply observing its presence. The “motte-and-bailey” tactic is clearly at play here – identifying the most vulnerable point (large-scale training) and building defenses around it, anticipating the inevitable attempts to evade detection. The suggestion that even “specialized” chips designed for inference could be compromised by a resourceful actor is a chilling acknowledgement of the potential for asymmetric warfare. Furthermore, the underlying assumption that we can *ever* definitively prove the absence of a large-scale training run – due to the complexity of distributed training and potential obfuscation techniques – is profoundly uncertain. This isn’t just about technical limitations; it’s about a fundamental epistemological challenge: how do you verify something you don’t fully understand? The recurring reference to “frontier AI” and the impending technological gap suggests a deeply ingrained anxiety about falling behind, a classic pattern of technological determinism where the future dictates the present. (ARC-0043 Motte-and-Bailey, ARC-0024 Ambiguity). The entire effort feels like a preemptive containment strategy, driven by a narrative of existential risk, lacking a clear articulation of the actual threats posed by AI. (ARC-0017 Systemic – Mission Drift).

Sentinel — Likely Human

Confidence

This analysis presents a well-structured, policy-oriented overview of AI verification methods, leveraging a conventional argumentative structure and exhibiting common hedging. While the content itself isn’t necessarily synthetic, the combination of stylistic traits – particularly the reliance on established argumentative frameworks and vague attribution – raises a moderate probability of AI assistance in its generation.

Signals Detected

High hedging density (e.g., "it’s worth noting", "one could argue"). Sentence length variance is moderate, showing a typical human writing style.

The text presents a ‘both sides’ framing, common in policy discussions but less natural for a focused technical analysis. It avoids passionate argumentation.

Argumentative skeleton largely follows a standard policy recommendation format (goal, methods, challenges), reflecting a well-established template. Vague attribution ("experts say", "studies show") is prevalent.

The reference to ‘MIRI’ (Machine Intelligence Research Institute) and its claims about existential risk is a detail that appears somewhat convenient and lacks immediate verifiable context, aligning with potential LLM confabulation.

Human Indicators

Frequent use of introductory phrases and softening language suggests an attempt to present information neutrally and avoid direct, assertive statements. The discussion of specific chip vendors (Sohu, Groq) feels somewhat speculative given the rapidly evolving landscape.

The inclusion of footnotes, while standard for academic writing, contributes to the overall impression of a carefully constructed, slightly over-explained argument.