Thanks Yuxin—and thanks especially for the description in your post of the mapping into “Robot SOTIF”, and how that might play in China’s standards-driven environment. You also wrote that if the coverage maps and risk assessments produced by CDV can become evidence of “reasonable care”—just like safety cases in autonomous driving—then alignment V&V gains an institutional incentive base.
That incentives part is outside my area of expertise, and yet it is crucial. Without it, rigorous alignment V&V loses to “ship faster” (in all jurisdictions). In AV-land what makes expensive, systematic V&V rational is the well-established “incident → investigation → someone is liable” loop, but there’s no AI analog yet.
The most promising hook I know of is the work trying to close the “responsibility gap” by attaching an AI agent’s actions to a human or corporate principal. Note this is mostly not aimed at the labs building general-purpose models, but rather at whoever deploys a specific AI-for-something (an AI CEO, a medical AI, a delivery robot) and thereby becomes the identifiable principal (and that specificity also makes the coverage map more tractable). If that holds, CDV-style coverage maps and risk-per-bucket claims can become exactly the “reasonable care” record such a principal would need.
If anyone reading this works on AI governance, algorithm assessment, or liability and sees a way to make rigorous V&V the path of least resistance rather than a cost center, I’d very much like to talk.
I come from an adjacent field—verification and validation of “physical AI”—so I read this partly as a verification problem. On genuinely unsupervisable fuzzy tasks, such as judging novel research agendas or paradigm-level reframings, I don’t think V&V gives a magic answer—those seem as hard as you say.
But I think one important subproblem in your bucket—assessing whether a system behaves acceptably across the situations that matter—is more supervisable than the framing may suggest. Coverage-driven verification lets you turn “did we exercise the relevant situations, and did behavior hold across them?” into a measured property rather than a single global judgement. Mature V&V also has machinery for validating the checker itself, and an active (human + AI) process for discovering missing coverage dimensions—the “spec bugs” that are obvious in hindsight but not enumerable upfront (see link below).
This doesn’t touch correlated-evidence aggregation. It helps only partially with paradigm choice (it can surface new coverage dimensions, though not replace the framing they sit in). On proxies it helps with coverage, not with whether the proxy is relevant to alignment at all. And coverage discovery is unbounded—not a proof of completeness. But assuming scheming is off the table (as you suggest for this discussion), many concrete failures, once found using the machinery above, convert into ordinary spec checks. So I’d suggest the genuinely irreducible part is narrower than the full fuzzy-task set, concentrated where a system’s behavior changes because it’s being measured.
Fuller version here.