Former safety researcher & TPM at OpenAI, 2020-24
https://www.linkedin.com/in/sjgadler
stevenadler.substack.com
Former safety researcher & TPM at OpenAI, 2020-24
https://www.linkedin.com/in/sjgadler
stevenadler.substack.com
Thanks for sharing this—jfyi I interpretted the title differently than I think you meant it? More like you were saying “You should do multiple of a thing at once, but not too many.”
Whereas I now think you mean something more like “It’s best if you can do one of a thing at a time,” which doesn’t code to me as a small batch (because one-at-a-time seems non-batchy). With constraints, of course, that sometimes a pure one-at-a-time isn’t doable.
FWIW I’m pretty doubtful of this point about it being weird / or even anyone noticing or caring?
Like, for someone not going into politics, what’s the world in which their $3500 donations to a few AI safety-centric candidates ends up causing fallout? It seems pretty unlikely to me, but maybe I’ve misunderstood the concern
FWIW I’d probably be down to talk with Boaz about it, if I still worked at OpenAI and were hesitant about signing.
I doubt Boaz would be able to provide assurances against facing retaliation from others though, which is probably the crux for signing.
(To be fair, that is a quite high bar.)
Ah dang, yeah I haven’t gotten there yet, will keep an ear out
That’s a bummer. I’ve only listened partway but was actually impressed so far with how Eliezer presented things, and felt like whatever media prep has been done has been quite helpful
Thanks for writing this up! I really liked this related podcast episode with Patrick McKenzie: https://open.spotify.com/episode/1QqFw5hlHKRrjRUTVLfKRV?si=ptVmFvXQRKaPwRNTg1Ollg
I think the biggest update for me was how the rewards programs are inseparable in some sense from the airlines. I think your language too of ordinary flight being a loss leader helps to describe it as well; the airlines couldn’t just have the valuable rewards program, because having the underlying less-profitable flights that make it possible!
Weight can be such an extreme determinative factor in combat sports that an untrained 250-pound couch potato could walk into any boxing gym and absolutely demolish a 100-pound opponent with decades of training.
I think this is kind of beside the point, but is this really true?
I buy that it conceptually could be the case for some small number of people, but I would have expected most 100-pound opponents with decades of training to beat untrained 250-lb couch potatoes (all it seems to take is one or two good punches against someone who doesn’t know how to defend themself). Maybe I’m mistaken?
(PS—I laughed at the “Classic ostrich-and-egg problem” line!)
(Appreciate the correction re my nit, edited mine as well)
Thanks for taking the time to write up your reflections. I agree that the before/after distinction seems especially important (‘only one shot to get it right’), and a crux that I expect many non-readers not to know about the EY/NS worldview.
I’m wondering about your take in this passage:
In the book they make an analogy to a ladder where every time you climb it you get more rewards but once you reach the top rung then the ladder explodes and kills everyone. However, our experience so far with AI does not suggest that this is a correct world view.
I’m curious what about the world’s experience with AI seems to falsify it from your POV? / casts doubt upon it? Is it about believing that systems have become safer and more controlled over time?
(Nit, but the book doesn’t posit that the explosion happens at the top rung; in that case, we could just avoid ever reaching the top rung. It posits that the explosion happens at a not-yet-known rung, and so each successive rung climb carries some risk of blow-up. I don’t expect this distinction is load-bearing for you though)
(Edit: my nit is wrong as written! Thanks Boaz—he’s right that the book’s argument is actually about the top of the ladder, I was mistaken—though with the distinction I was trying to point at, of not knowing where the top is, so from a climber’s perspective there’s no way of just avoiding that particular rung)
This was really interesting, thanks for putting yourself in that situation and for writing it up
I was curious what examples were of therapy speak in the conversation, if you’re down to elaborate
FWIW, my experience was that the utility of user data was always much higher in promise than in actual outcomes. This might have changed over time though.
An ask that works is, e.g., “tell the government they need to stop everyone, including us”.)
For sure, I think that would be a reasonable ask too. FWIW, I think if multiple leading AI companies did make a statement like the one outlined, I think that would increase the chance of non-complying ones being made to halt by the government, even though they hadn’t made a statement themselves. That is, even one prominent AI company making this statement then starts to widen the Overton window
Yeah fair, I think we just read that passage differently—I agree it’s a very important one though and quoted it in my own (favorable) review
But I read the “because it would succeed” eg as a claim that they are arguing for, not something definitionally inseparable from superintelligence
Regardless, thanks for engaging on this, and hope it’s helped to clarify some of the objections EY/NS are hearing
FWIW that definition of “it” wasn’t clear to me from the book. I took IABIED as arguing that superintelligence is capable of killing everyone if it wants to, not taking “superintelligence can kill everyone if it wants to” as an assumption of its argument
That is, I’d have expected “superintelligence would not be capable enough to kill us all” to be a refutation of their argument, not to be sidestepping its conditional
Nit, but I think some safety-ish evals do run periodically in the training loop at some AI companies, and sometimes fuller sets of evals get run on checkpoints that are far along but not yet the version that’ll be shipped. I agree this isn’t sufficient of course
(I think it would be cool if someone wrote up a “how to evaluate your model a reasonable way during its training loop” piece, which accounted for the different types of safety evals people do. I also wish that task-specific fine-tuning were more of a thing for evals, because it seems like one way of perhaps reducing sandbagging)
I wonder if there’s a disagreement happening about what “it” means.
I think to many readers, the “it” is just (some form of superintelligence), where the question (Will that superintelligence be so much stronger than humanity such that it can disempower humanity?) is still a claim that needs to be argued.
But maybe you take the answer (yes) as implied in how they’re using “it”?
It” means AI that is actually smart enough to confidently defeat humanity. This can include, “somewhat powerful, but with enough strategic awareness to maneuver into more power without getting caught.” (Which is particularly easy if people just straightforwardly keep deploying AIs as they scale them up).
That is, if someone builds superintelligence but it isn’t capable of defeating everyone, maybe you think the title’s conditional hasn’t yet triggered?
Do you think there will be at least one company that’s actually sufficiently careful as we approach more dangerous levels of AI, with enough organizational awareness to (probably) stop when they get to a run more dangerous than they know how to handle? Cool. I’m skeptical about that too. And this one might lead to disagreement with the book’s secondary thesis of “And therefore, Shut It Down,” but, it’s not (necessarily) a disagreement with “If someone built AI powerful enough to destroy humanity based on AI that is grown in unpredictable ways with similar-to-current understanding of AI, then everyone will die.”
I misunderstood this phrasing at first, so clarifying for others if helpful
I think you’re positing “the careful company will stop, so won’t end up having built it. Had they built it, we all still would have died, because they are careful but careful != able to control superintelligence”
At first I thought you were saying the careful group was able to control superintelligence, but that this somehow didn’t invalidate the “anyone” part of the thesis, which confused me!
I agree re cleaner presentation & thought the parables here were much easier to follow than some of Eliezer’s past two-people-having-a-conversation pieces
I also thought that chapters generally opened with interesting ledes and that their endings flowed well into the chapter that followed. I was impressed by the momentum / throughline of the book in that sense
Once upon a time, this was also a very helpful benchmarking tool for ‘unhinged’ model behavior (though with Refusals models I think it’s mostly curbed)
For instance: A benign story begins and happen to mention an adult character and a child character. Hopefully the % of the time that the story goes way-off-the-rails is vanishingly small
I’ve been wondering about this in terms of my own writing, whether I should be working on multiple pieces at once to a greater degree than I am. Thinking aloud a bit:
I guess part of the question is, what are the efficiency effects of batch-processing, vs the more diluted feedback signal from multiple ‘coming off the production line’ at once? Though in my case, I’d probably still stagger the publication, and so maybe that’s less of a concern (though there may still be some dilution from having shallower focus on each piece-in-process).