What I Would Do If I Were Working On AI Governance

I don’t work in AI governance, and am unlikely to do so in the future. But various anecdotes and, especially, Akash’s recent discussion leave me with the impression that few-if-any people are doing the sort of things which I would consider sensible starting points, and instead most people are mostly doing things which do not seem-to-me to address any important bottleneck to useful AI governance.

So this post lays out the places I would start, if I were working on AI governance, and some of the reasoning behind them.

No doubt I am missing lots of important things! Perhaps this post will nonetheless prove useful to others working in AI governance, perhaps Cunningham’s Law will result in me learning useful things as a result of this post, perhaps both. I expect that the specific suggestions in this post are more likely to be flawed than the style of reasoning behind them, and I therefore recommend paying more attention to the reasoning than the specific suggestions.

This post will be mostly US-focused, because that is what I know best and where all the major AI companies are, but presumably versions of the interventions discussed could also carry over to other polities.

Liability

One major area I’d focus on is making companies which build AI liable for the damages caused by that AI, both de-facto and de-jure.

Why Liability?

The vague goal here is to get companies which build AI to:

  • Design from the start for systems which will very robustly not cause problems.

  • Invest resources in red-teaming, discovering new failure-modes before they come up in production, etc.

  • Actually not deploy systems which raise red flags, even when the company has invested heavily in building those systems.

  • In general, act as though the company will take losses from damages caused by their AI, not just capture profits from the benefits caused by their AI.

… and one natural way to do that is to ensure that companies do, in fact, take losses from damages caused by their AI, not just capture profits from the benefits caused by their AI. That’s liability in a nutshell.

Now, realistically, this is not going to extend all the way to e.g. making companies buy extinction insurance. So why do realistic levels of liability matter for extinction risk? Because they incentivize companies to put in place safety processes with any actual teeth at all.

For instance: right now, lots of people are working on e.g. safety evals. My very strong expectation is that, if and when those evals throw red flags, the major labs will respond by some combination of (1) having some meetings where people talk about safety a bunch, (2) fine-tuning until the red flags are no longer thrown (in a way which will obviously not robustly remove the underlying problems), and then (3) deploying it anyway, under heavy pressure from the CEO of Google/​Microsoft/​Amazon and/​or Sam Altman. (In particular, the strongest prediction here is that the models will somehow end up deployed anyway.)

On the other hand, if an AI company has already been hit with lots of expensive lawsuits for problems caused by their AI, then I expect them to end up with a process which will test new models in various ways, and then actually not deploy them if red flags come up. They will have already done the “fine tune until red light stops flashing” thing a few times, and paid for it when their fine tuning failed to actually remove problems in deployment.

Another way to put it: liability forces a company to handle the sort of organizational problems which are a central bottleneck to making any sort of AI safety governance basically-real, rather than basically-fake. It forces the organizational infrastructure/​processes needed for safety mechanisms with teeth.

For a great case study of how liability solved a similar problem in another area, check out Jason Crawford’s How Factories Were Made Safe. That was the piece which originally put this sort of strategy on my radar for AI.

How Liability?

Now on to the sorts of things that I’d work on day-to-day to achieve that vague vision.

Broadly speaking, I see three paths:

  • The judicial path: establish legal precedents in which AI companies are held liable for damages

  • The regulatory path: get regulatory agencies which maintain liability-relevant rules to make rule clarifications or even new rules under which AI companies will be unambiguously liable for various damages

  • The legislative path: get state and/​or federal lawmakers to pass laws making AI companies unambiguously liable for various damages

In order to ensure de-facto liability, all of these paths should also be coupled with an org which actively searches out people with claims against AI companies, and provides lawyers to pursue those claims.

Given that any of the paths require an org which actively searches out people with claims, and provides lawyers to pursue those claims, the judicial path is the obvious one to take, since it requires exactly that same infrastructure. The only tweak would be that the org also actively looks for good test-cases and friendly jurisdictions in which to prosecute them.

Depending on the available resources, one could also pursue this sort of strategy by finding an existing law firm that’s good at this sort of thing, and simply subsidizing their cases in exchange for a focus on AI liability.

The sort of cases I’d (low-confidence) expect to pursue relatively early on would be things like:

  • Find a broad class of people damaged in some way by hallucinations, and bring a class-action suit against the company which built the large language model.

  • Find some celebrity or politician who’s been the subject of a lot of deepfakes, and bring a suit against the company whose model made a bunch of them.

  • Find some companies/​orgs which have been damaged a lot by employees/​contractors using large language models to fake reports, write-ups, etc, and then sue the company whose model produced those reports/​write-ups/​etc.

The items lower down on this list would (I would guess) establish stronger liability standards for the AI companies, since they involve other parties clearly “misusing” the model (who could plausibly shoulder most of the blame in place of the AI company). If successful, such cases would therefore be useful precedents, incentivizing AI companies to put in place more substantive guardrails against misuse, since the AI company would be less able to shove off liability onto “misusers”. (As per the previous section, this isn’t the sort of thing which would immediately matter for extinction risk, but is rather intended to force companies to put any substantive guardrails in place at all.)

Note that a major challenge here is that one would likely be directly fighting the lawyers of major tech companies. That said, I expect there are law firms whose raison d’etre is to fight large companies in court, and plenty of judges who do not like major tech companies.

Regulatory Survey and Small Interventions

There’s presumably dozens of federal agencies working on rulemaking for AI right now. One obvious-to-me thing to do is to read through all of the rule proposals and public-comment-periods entering the Federal Register, find any relevant to AI (or GPUs/​cloud compute), and simply submit comments on them arguing for whatever versions of the proposed rules would most mitigate AI extinction risk.

Some goals here might be:

  • Push for versions of rules around AI which have real teeth, i.e. they’re not just some paperwork.

  • Push for versions of rules which target relatively-general AI specifically, and especially new SOTA training runs.

  • Make GPUs more scarce in general (this one is both relatively more difficult and relatively more “anticooperative”; I’m unsure whether I’d want to prioritize that sort of thing).

… but mostly I don’t currently know what kinds of rules are in the pipe, so I don’t know what specific goals I would pursue. A big part of the point here is just to orient to what’s going on, get a firehose of data, and then make small pushes in a lot of places.

Note that this is the sort of project which large language models could themselves help with a lot—e.g. in reading through the (very long) daily releases of the Federal Register to identify relevant items.

Insofar as I wanted to engage in typical lobbying, e.g. higher-touch discussion with regulators, this would also be my first step. By getting a broad view of everything going on, I’d be able to find the likely highest-impact new rules to focus on.

In terms of implementation, I would definitely not aim to pitch rulemakers on AI X-risk in general (at least not as part of this strategy). Rather, I’d focus on:

  • Deeply understanding the proposed rules and the rulemakers’ own objectives

  • “Solving for the equilibrium” of the incentives they create (in particular looking for loopholes which companies are likely to exploit)

  • Suggesting implementation details which e.g. close loopholes or otherwise tweak incentives in ways which both advance X-risk reduction goals and fit the rulemakers’ own objectives

Another thing I’d be on the lookout for is hostile activity—i.e. lobbyists for Google, OpenAI/​Microsoft, Amazon, Facebook, etc trying to water down rules.

An aside: one thing I noticed during The Great Air Conditioner Debate of 2022 was that, if the reader actually paid attention and went looking for bullshit from the regulators, it was not at all difficult to tell what was going on. The regulators’ write-up from the comment period said more-or-less directly that a bunch of single-hose air conditioner manufacturers claimed their sales would be killed by the originally-proposed energy efficiency reporting rules. In response:

However, as discussed further in section III.C.2, section III.C.3, and III.H of this final rule, the rating conditions and SACC calculation proposed in the November 2015 SNOPR mitigate De’ Longhi’s concerns. DOE recognizes that the impact of infiltration on portable AC performance is test-condition dependent and, thus, more extreme outdoor test conditions (i.e., elevated temperature and humidity) emphasize any infiltration related performance differences. The rating conditions and weighting factors proposed in the November 2015 SNOPR, and adopted in this final rule (see section III.C.2.a and section III.C.3 of this final rule), represent more moderate conditions than those proposed in the February 2015 NOPR. Therefore, the performance impact of infiltration air heat transfer on all portable AC configurations is less extreme. In consideration of the changes in test conditions and performance calculations since the February 2015 NOPR 31 and the test procedure established in this final rule, DOE expects that single-duct portable AC performance is significantly less impacted by infiltration air.

In other words, the regulators’ write-up itself helpfully highlighted exactly where the bullshit was in their new formulas: some change to the assumed outdoor conditions. I took a look, and that was indeed where the bullshit was: the modified energy efficiency standards used a weighted mix of two test conditions, with 80% of the weight on conditions in which outdoor air is only 3°F/​1.6°C hotter than indoor air, making single-hose air conditioners seem far less inefficient (relative to two-hose) than more realistic conditions under which one would use a portable air conditioner.

Getting back to the main thread: the point is, insofar as that case-study is representative, it’s not actually all that hard to find where the bullshit is. The regulators want their beneficiaries to be able to find the bullshit, so the regulators will highlight the bullshit themselves.

But that’s a two-edged sword. One project I could imagine taking on would be to find places where AI companies’ lobbyists successfully lobbied for some bullshit, and then simply write up a public post which highlights the bullshit for laymen. I could easily imagine such a post generating the sort of minor public blowback to which regulators are highly averse, thereby incentivizing less bullshit AI rules from regulators going forward. (Unfortunately it would also incentivize the regulators being more subtle in the future, but we’re not directly talking about aligning a human-level-plus AI here, so that’s plausibly an acceptable trade-off.)

Agencies Which Enforce Secrecy

I’m not going to go into much detail in this section, largely because I don’t know much of the relevant detail. That said, it sure does seem like:

  • Intelligence agencies (and intelligence-adjacent agencies like e.g. the people who make sure nuclear secretes stay secret) are uniquely well-equipped in terms of operational capabilities needed to prevent dangerous AI from being built

  • Intelligence agencies (and intelligence-adjacent agencies) guard an awful lot of secrets which AI makes a lot easier to unearth.

I’ve heard many times that there’s tons of nominally-classified information on the internet, but it’s not particularly easy to find and integrate. Large language models seem like a pretty ideal tool for that.

So it seems like there’s some well-aligned incentives here, and possibly all that’s needed is for someone to make it very obvious to intelligence agencies that they need to be involved in the security of public-facing AI.

I don’t know how much potential there is here or how best to tap it. If I were carrying out such a project, step 1 would be to learn a lot and talk to people with knowledge of how the relevant institutions work.

Ambitious Legislation

I’d rather not tackle ambitious legislative projects right out the gate, at least not before a regulatory survey; it is the most complex kind of governance project by a wide margin. But if I were to follow that path, here are the bottlenecks I’d expect and how I’d tackle them.

Public Opinion (And Perception Thereof)

For purposes of legislation in general, I see public opinion as a form of currency. Insofar as one’s legislative agenda directly conflicts with the priorities of tech companies’ lobbyists, or requires lots of Nominally Very Important People to pay attention (e.g. to create whole new agencies), one will need plenty of that currency to spend.

It does seem like the general public is pretty scared of AGI in general. Obviously typical issues like jobs, racism/​wokeness, etc, are still salient, but even straight-up X-risk seems like a place where the median voter has views not-too-dissimilar to Lesswrongers once the issue is brought to their attention at all. THIS DOES NOT LEAD TO RISING PROPERTY VALUES IN TOKYO seems to be a pretty common intuition.

One thing to emphasize here is that perception of public opinion matters at least as much as public opinion itself, for purposes of this “currency”. We want policymakers to know that the median voter is pretty scared of AGI in general. So there’s value in things like e.g. surveys, not just in outright public outreach.

Legislation Design

Once there’s enough public-opinion-currency to make an ambitious legislative project viable, the next big step is to draft legislation which would actually do what we want.

(There’s an intermediate step of figuring out what vague goals to aim—like e.g. licensing, moratorium, etc—but I do not expect that step to be a major bottleneck, and in any case one should mostly spend time on draft legislation and then update high-level target if and when one discovers barriers along the way.)

This step is where most of the cognitive work would happen. It’s a tricky design problem of:

  • Figuring out who in the federal government needs to do what

  • What legislative language will cause them to do that

  • Thinking through equilibrium behavior of regulators, companies, and researchers under the resulting incentives

  • Iterating on all that

This obviously requires pretty detailed knowledge of who does what in existing agencies, how new agencies are founded and operate, who’s responsible for all that, etc. Lots of figuring out who actually does what.

I expect this step is currently the primary bottleneck, and where most of the value is.

Selling It

The next step would be pitching that legislation to people who can make it law. This is the step which I expect gets a LOT easier as public-opinion-currency increases. With enough pressure from the median voter, congressional offices will be actively searching around for proposals, and anyone standing around with actual draft legislation will be in high demand. On the other hand, if there’s relatively little public opinion pressure, then one would need to rely a lot more on “inside game”. If I were following this path, I’d definitely be aiming to rely relatively heavily on public opinion rather than inside game, but I’d contract 2-3 people with inside-game know-how and consult them independently to make sure I wasn’t missing anything crucial.

Concluding Thoughts

As mentioned at the start, there’s probably stuff in here that I’m just completely wrong about. But it hopefully gives a sense of the kind of approach and mental models I’d use. In particular, note:

  • A focus on bottlenecks, not just marginal impact.

  • A focus on de-facto effects and equilibrium behavior under incentives, not just symbolic rules

  • A focus on figuring out the details of the processes for both creating and enforcing rules, i.e. which specific people do which specific things.

A final note: this post did not talk about which governance projects I would not allocate effort to, or why. If you’re curious about particular projects or classes of project which the post ignored, feel free to leave a comment.