Former AI safety research engineer, now AI governance researcher at OpenAI. Blog: thinkingcomplete.com
Richard_Ngo
I disagree with the first one. I think that the spectrum of human-level AGI is actually quite wide, and that for most tasks we’ll get AGIs that are better than most humans significantly before we get AGIs that are better than all humans. But the latter is much more relevant for recursive self-improvement, because it’s bottlenecked by innovation, which is driven primarily by the best human researchers. E.g. I think it’d be pretty difficult to speed up AI progress dramatically using millions of copies of an average human.
Also, by default I think people talk about FOOM in a way that ignores regulations, governance, etc. Whereas in fact I expect these to put significant constraints on the pace of progress after human-level AGI.
If we have millions of copies of the best human researchers, without governance constraints on the pace of progress… Then compute constraints become the biggest thing. It seems plausible that you get a software-only singularity, but it also seems plausible that you need to wait for AI innovation of new chip manufacturing to actually cash out in the real world.
I broadly agree with the second one, though I don’t know how many people there are left with 30-year timelines. But 20 years to superintelligence doesn’t seem unreasonable to me (though it’s above my median). In general I’ve updated lately that Kurzweil was more right than I used to think about there being a significant gap between AGI and ASI. Part of this is because I expect the problem of multi-agent credit assignment over long time horizons to be difficult.
In the last 24 hours. I read fast (but also skipped the last third of the Doomsday Machine).
This comment prompted me to read both Secrets and also The Doomsday Machine by Ellsberg. Both really great, highly recommend.
I think “being the kind of agent who survives the selection process” can sometimes be an important epistemic thing to consider
I’m not claiming it’s zero information, but there are lots of things that convey non-zero information which it’d be bad to set disclosure norms based on. E.g. “I’ve only ever worked at nonprofits” should definitely affect your opinion of someone’s epistemics (e.g. when they’re trying to evaluate corporate dynamics) but once we start getting people to disclose that sort of thing there’s no clear stopping point. So mostly I want the line to be “current relevant conflicts of interest”.
Ooops, good catch.
But I also think that one of the reasons why Richard still works at OpenAI is because he’s the kind of agent who genuinely believes things that tend to be pretty aligned with OpenAI’s interests, and I suspect his perspective is informed by having lots of friends/colleagues at OpenAI.
Added a disclaimer, as suggested. It seems like a good practice for this sort of post. Though note that I disagree with this paragraph; I don’t think “being the kind of agent who X” or “being informed by many people at Y” are good reasons to give disclaimers. Whereas I do buy that “they filter out any ideas that they have that could get them in trouble with the company” is an important (conscious or unconscious) effect, and worth a disclaimer.
I’ve also added this note to the text:
Note that most big companies (especially AGI companies) are strongly structurally power-seeking too, and this is a big reason why society at large is so skeptical of and hostile to them. I focused on AI safety in this post both because companies being power-seeking is an idea that’s mostly “priced in”, and because I think that these ideas are still useful even when dealing with other power-seeking actors.
No legible evidence jumps to mind, but I’ll keep an eye out. Inherently this sort of thing is pretty hard to pin down, but I do think I’m one of the handful of people that most strongly bridges the AI safety and accelerationist communities on a social level, and so I get a lot of illegible impressions.
Presumably, at some point, some groups start advocating for specific policies that go against the e/acc worldview. At that point, it seems like you get the organized resistance.
My two suggestions:
People stop aiming to produce proposals that hit almost all the possible worlds. By default you should design your proposal to be useless in, say, 20% of the worlds you’re worried about (because trying to get that last 20% will create really disproportionate pushback); or design your proposal so that it leaves 20% of the work undone (because trusting that other people will do that work ends up being less power-seeking, and more robust, than trying to centralize everything under your plan). I often hear people saying stuff like “we need to ensure that things go well” or “this plan needs to be sufficient to prevent risk”, and I think that mindset is basically guaranteed to push you too far towards the power-seeking end of the spectrum. (I’ve added an edit to the end of the post explaining this.)
As a specific example of this, if your median doom scenario goes through AGI developed/deployed by centralized powers (e.g. big labs, govts) I claim you should basically ignore open-source. Sure, there are some tail worlds where a random hacker collective beats the big players to build AGI; or where the big players stop in a responsible way, but the open-source community doesn’t; etc. But designing proposals around those is like trying to put out candles when your house is on fire. And I expect there to be widespread appetite for regulating AI labs from govts, wider society, and even labs themselves, within a few years’ time, unless those proposals become toxic in the meantime—and making those proposals a referendum on open-source is one of the best ways I can imagine to make them toxic.
(I’ve talked to some people whose median doom scenario looks more like Hendrycks’ “natural selection” paper. I think it makes sense by those people’s lights to continue strongly opposing open-source, but I also think those people are wrong.)
I think that the “we must ensure” stuff is mostly driven by a kind of internal alarm bell rather than careful cost-benefit reasoning; and in general I often expect this type of motivation to backfire in all sorts of ways.
In a world where AI safety folks didn’t say/do anything about OS, I would still suspect clashes between e/accs and AI safety folks.
There’s a big difference between e/acc as a group of random twitter anons, and e/acc as an organized political force. I claim that anti-open-source sentiment from the AI safety community played a significant role (and was perhaps the single biggest driver) in the former turning into the latter. It’s much easier to form a movement when you have an enemy. As one illustrative example, I’ve seen e/acc flags that are a version of the libertarian flag saying “come and take it [our GPUs]”. These are a central example of an e/acc rallying cry that was directly triggered by AI governance proposals. And I’ve talked to several principled libertarians who are too mature to get sucked into a movement by online meme culture, but who have been swung in that direction due to shared opposition to SB-1047.
Consider, analogously: Silicon Valley has had many political disagreements with the Democrats over the last decade—e.g. left-leaning media has continuously been very hostile to Silicon Valley. But while the incentives to push back were there for a long time, the organized political will to push back has only arisen pretty recently. This shows that there’s a big difference between “in principle people disagree” and “actual political fights”.
I think it’s extremely likely this would’ve happened anyways. A community that believes passionately in rapid or maximally-fast AGI progress already has strong motivation to fight AI regulations.
This reasoning seems far too weak to support such a confident conclusion. There was a lot of latent pro-innovation energy in Silicon Valley, true, but the ideology it gets channeled towards is highly contingent. For instance, Vivek Ramaswamy is a very pro-innovation, anti-regulation candidate who has no strong views on AI. If AI safety hadn’t been such a convenient enemy then plausibly people with pro-innovation views would have channeled them towards something closer to his worldview.
- 21 Jul 2024 21:19 UTC; 5 points) 's comment on Towards more cooperative AI safety strategies by (
Coalitional agency
I haven’t yet read through them thoroughly, but these four papers by Oliver Richardson are pattern-matching to me as potentially very exciting theoretical work.
tl;dr: probabilistic dependency graphs (PDGs) are directed graphical models designed to be able to capture inconsistent beliefs (paper 1). The definition of inconsistency is a natural one which allows us to, for example, reframe the concept of “minimizing training loss” as “minimizing inconsistency” (paper 2). They provide an algorithm for inference in PDGs (paper 3) and an algorithm for learning via locally minimizing inconsistency which unifies several other algorithms (like the EM algorithm, message-passing, and generative adversarial training) (paper 4).Oliver is an old friend of mine (which is how I found out about these papers) and a final-year PhD student at Cornell under Joe Halpern.
I expect neither to work in practice, since I don’t think that either [broad competence of decision-makers] or [increased legitimacy of broad (and broadening!) AIS community] help us much at all in achieving our goals. To achieve our goals, I expect we’ll need something much closer to ‘our’ people in power.
While this seems like a reasonable opinion in isolation, I also read the thread where you were debating Rohin and holding the position that most technical AI safety work was net-negative.
And so basically I think that you, like Eliezer, have been forced by (according to me, incorrect) analyses of the likelihood of doom to the conclusion that only power-seeking strategies will work.
From the inside, for you, it feels like “I am responding to the situation with the appropriate level of power-seeking given how extreme the circumstances are”.
From the outside, for me, it feels like “The doomers have a cognitive bias that ends up resulting in them overrating power-seeking strategies, and this is not a coincidence but instead driven by the fact that it’s disproportionately easy for cognitive biases to have this effect (given how the human mind works)”.
Fortunately I think most rationalists have fairly good defense mechanisms against naive power-seeking strategies, and this is to their credit. So the main thing I’m worried about here is concentrating less force behind non-power-seeking strategies.
Yes, I’m saying it’s a reasonable conclusion to draw, and the fact that it isn’t drawn here is indicative of a kind of confirmation bias.
Ah, sorry for the carelessness on my end. But this still seems like a substantive disagreement: you expect
, and I don’t, for the reasons in my comment.
Thanks for the extensive comment! I’m finding this discussion valuable. Let me start by responding to the first half of your comment, and I’ll get to the rest later.
The simplicity of a goal is inherently dependent on the ontology you use to view it through: while is (likely) true, pay attention to how this changes the ontology! The goal of the agent is indeed very simple, but not because the “essence” of the goal simplifies; instead, it’s merely because it gets access to a more powerful ontology that has more detail, granularity, and degrees of freedom. If you try to view in instead of , meaning you look at the preimage , this should approximately be the same as : your argument establishes no reason for us to think that there is any force pulling the goal itself, as opposed to its representation, to be made smaller.
One way of framing our disagreement: I’m not convinced that the f operation makes sense as you’ve defined it. That is, I don’t think it can both be invertible and map to goals with low complexity in the new ontology.
Consider a goal that someone from the past used to have, which now makes no sense in your ontology—for example, the goal of reaching the edge of the earth, for someone who thought the earth was flat. What does this goal look like in your ontology? I submit that it looks very complicated, because your ontology is very hostile to the concept of the “edge of the earth”. As soon as you try to represent the hypothetical world in which the earth is flat (which you need to do in order to point to the concept of its “edge”), you now have to assume that the laws of physics as you know them are wrong; that all the photos from space were faked; that the government is run by a massive conspiracy; etc. Basically, in order to represent this goal, you have to set up a parallel hypothetical ontology (or in your terminology, needs to encode a lot of the content of ). Very complicated!
I’m then claiming that whatever force pushes our ontologies to simplify also pushes us away from using this sort of complicated construction to represent our transformed goals. Instead, the most natural thing to do is to adapt the goal in some way that ends up being simple in your new ontology. For example, you might decide that the most natural way to adapt “reaching the edge of the earth” means “going into space”; or maybe it means “reaching the poles”; or maybe it means “pushing the frontiers of human exploration” in a more metaphorical sense. Importantly, under this type of transformation, many different goals from the old ontology will end up being mapped to simple concepts in the new ontology (like “going into space”), and so it doesn’t match your definition of .
All of this still applies (but less strongly) to concepts that are not incoherent in the new ontology, but rather just messy. E.g. suppose you had a goal related to “air”, back when you thought air was a primitive substance. Now we know that air is about 78% nitrogen, 21% oxygen, and 0.93% argon. Okay, so that’s one way of defining “air” in our new ontology. But this definition of air has a lot of messy edge cases—what if the ratios are slightly off? What if you have the same ratios, but much different pressures or temperatures? Etc. If you have to arbitrarily classify all these edge cases in order to pursue your goal, then your goal has now become very complex. So maybe instead you’ll map your goal to the idea of a “gas”, rather than “gas that has specific composition X”. But then you discover a new ontology in which “gas” is a messy concept...
If helpful I could probably translate this argument into something closer to your ontology, but I’m being lazy for now because your ontology is a little foreign to me. Let me know if this makes sense.
A more systematic case for inner misalignment
I think this whole debate is missing the point I was trying to make. My claim was that it’s often useful to classify actions which tend to lead you to having a lot of power as “structural power-seeking” regardless of what your motivations for those actions are. Because it’s very hard to credibly signal that you’re accumulating power for the right reasons, and so the defense mechanisms will apply to you either way.
In this case MIRI was trying to accumulate a lot of power, and claiming that they were aiming to use it in the “right way” (do a pivotal act) rather than the “wrong way” (replacing governments). But my point above is that this sort of claim is largely irrelevant to defense mechanisms against power-seeking.
(Now, in this case, MIRI was pursuing a type of power that was too weird to trigger many defense mechanisms, though it did trigger some “this is a cult” defense mechanisms. But the point cross-applies to other types of power that they, and others in AI safety, are pursuing.)
Would you say that “Alice going to a networking event” (assume she’s doing it socially conventional/appropriate ways) would count as structural power-seeking? And would you discourage her from going?
I think you’re doing a paradox of the heap here. One grain of sand is obviously not a heap, but a million obviously is. Similarly, Alice going to one networking event is obviously not power-seeking, but Alice taking every opportunity she can to pitch herself to the most powerful people she can find obviously is. I’m identifying a pattern of behavior that AI safety exhibits significantly more than other communities, and the fair analogy is to a pattern of behavior that Alice exhibits significantly more than other people around her.
I’m also a bit worried about a motte-and-bailey here. The bold statement is “power-seeking (which I’m kind of defining as anything that increases your influence, regardless of how innocuous or socially accepted it seems) is bad because it triggers defense mechanisms”
I flagged several times in the post that I was not claiming that power-seeking is bad overall, just that it typically has this one bad effect.
the more moderated statement is “there are some specific ways of seeking power that have important social costs, and I think that some/many actors in the community underestimate those costs
I repudiated this position in my previous comment, where I flagged that I’m trying to make a claim not about specific ways of seeking power, but rather about the outcome of gaining power in general.
e/acc has coalesced in defense of open-source, partly in response to AI safety attacks on open-source. This may well lead directly to a strongly anti-AI-regulation Trump White House, since there are significant links between e/acc and MAGA.
I think of this as a massive own goal for AI safety, caused by focusing too much on trying to get short-term “wins” (e.g. dunking on open-source people) that don’t actually matter in the long term.
Relevant: my post on value systematization
Though I have a sneaking suspicion that this comment was originally made on a draft of that?