Originally posted on my blog, 24 Jun 2025 - see also full PDF (26 pp)
Abstract
The V&V method is a concrete, practical framework which can complement several alignment approaches. Instead of asking a nascent AGI to “do X,” we instruct it to design and rigorously verify a bounded “machine-for-X”. The machine (e.g. an Autonomous Vehicle or a “machine” for curing cancer) is prohibited from uncontrolled self-improvement: Every new version must re-enter the same verification loop. Borrowing from safety-critical industries, the loop couples large-scale, scenario-based simulation with coverage metrics and a safety case that humans can audit. Human operators—supported by transparent evidence—retain veto power over deployment.
The method proceeds according to the following diagram:
This method is not a silver bullet: it still relies on aligned obedience, incentive structures that deter multi-agent collusion, and tolerates only partial assurance under tight competitive timelines. Yet it complements existing scalable-oversight proposals—bolstering Constitutional AI, IDA and CAIS with holistic system checks—and offers a practical migration path because industries already practice V&V (Verification and Validation) for complex AI today. In the critical years when AGI first appears, the V&V method could buy humanity time, reduce specification gaming, and focus research attention on the remaining hard gaps such as strategic deception and corrigibility.
TL;DR
Ask AGI to build & verify a task-specific, non-self-improving machine
Scenario-based, coverage-driven simulations + safety case give humans transparent evidence
Main contributions: Complements Constitutional AI, improves human efficiency in Scalable Oversight, handles “simpler” reward hacking systematically, bolsters IDA and CAIS
Works today in Autonomous Vehicles. Scales with smarter “AI workers”
Limits: Needs aligned obedience & anti-collusion incentives
About the author
Yoav Hollander led the creation of three V&V standards (IEEE 1647 “e”, Accellera PSS, ASAM OpenSCENARIO DSL) and is now trying to connect some of these ideas with AGI safety.
Got several variants of the following question: “You say you have no solution for scheming, which could make the V&V method unworkable. Why do you then claim it is a practical method?”
Here is my answer: Scheming could in theory break any oversight method, but the V&V method does make several anti-scheming contributions:
(1) Its scenario-based, coverage-driven tests flush out most specification gaming tricks (a milder form of reward hacking) automatically, essentially treating them as bugs (see § 4.2) (2) It plugs into other frameworks—e.g. in CAI it supplies the “external reality check” that prevents model-collapse (see § 3.1). (3) Because every run produces a human-auditable safety-case and logged sim traces, and because it is easy to request virtual sims of new scenarios, spot-audits are cheap—giving schemers a constant risk of exposure. (4) These spot-audits can make various anti-collusion approaches (like whistle-blower, rotation and diversity) more effective.
That is why I still call it a practical layer, even while we attack scheming head-on elsewhere. Would love to hear opinions about these points.