The post is about what “adulthood” means for goal engines, and where the vector from baby to adulthood points. Current AI safety work is only relevant to a “system that is still sufficiently baby-like”. But we should expect goal engines to be extremely mature. When you are negotiating with a human adult who is trying to maximize their company’s profit, there is no need to study the phenotype of the 3-month embryo that once scaffolded that human.
Liron
Interview with Steven Byrnes on His Mainline Takeoff Scenario
Humans at 1000x speed still retain many properties of immature goal engines, full of abstraction-breaking silly quirks, the same way ENIAC at 1000x speed can still get a literal bug (like a moth) in it. The direction of progress after ENIAC did not point toward ENIAC at 1000x speed.
P.S. Thanks for being the only one so far to engage with my claim.
I took the liberty to exaggerate “a 2-digit number of people” as a “nonexistent field” :)
The idea explained in the post, in a way that I don’t know what other reference already explains, is that there is a disconnect between the expected character of a mature goal engine, and the nature of the tools that are being developed under the name “AI safety”.
The Facade of AI Safety Will Crumble
If the post itself was ambiguous, I think there has been a ton of evidence in the 3+ years since that post that this community has a VERY non-fatalistic attitude about the situation.
I interpreted Eliezer’s message in that piece not as it being inevitable, but as there being many layers of problems that would need to be fixed, but with very little evidence that most of the layers had much hope of being fixed. In my view Eliezer has consistently been nimble about updating on evidence, but he thinks the path to extinction is vastly overdetermined unless many surprising updates come his way.
How Dario Amodei’s “The Adolescence of Technology” Delegitimizes AI X-Risk Concerns
PSA to those with flat or otherwise imperfect feet:
I finally got custom-made orthotics made, and they’re very different / way more correction than I expected compared to off-the-shelf orthotics, in a good way. Highly recommended!
Amazing post. Meta-level it’s very well argued and good-faith, and object-level these arguments are spot on IMO, especially how you unpacked the details of exactly how his post falls victim to the Multiple Stage Fallacy.
I debated BB a couple days ago for an upcoming episode of Doom Debates, and while I warned him that MSF in complex domains is a huge trap that makes arguments like his almost never work, I wasn’t able to pin down the problem with his stages the way you did here.
I’m really happy with the meta-level quality of BB’s original post and your reply (and with BB’s conduct in our Doom Debate). I wish discourse of this caliber among the various AI x-risk positions was much more common.
Here’s my recent interview with Tsvi about Berkeley Genomics project. I asked him what I think are cruxy questions about whether it’s worth supporting, and I think the conclusion is yes!
I suspect the real disagreement between you and Anthropic-blamers like me is downstream of a P(Doom) disagreement (where yours is pretty low and others’ is high), since I’ve seen this is often the case with various cases of smart people disagreeing.
Realistically/pragmatically balanced moves in a lowish-P(Doom) world are unacceptable in a high-P(Doom) world.
AI Corrigibility Debate: Max Harms vs. Jeremy Gillen
I just noticed the LessWrong site loads a lot faster than it used to. Very cool!
Makes sense. Only problem is, bear fat + sugar + salt seems qualitatively pretty similar to ice cream. It doesn’t seem like it neglected the qualitative spirit of why ice cream is good, which just adds to the fine parsing needed to get value out of this.
The fact still stands that ice cream is what we mass produce and send to grocery stores.
Yeah, I guess this exact observation is critical to making Eliezer’s analogy accurate.
IMO “predicting that bear fat with honey and salt tastes good” is analogous to “predicting that harnessing a star’s power will be an optimization target” — something we probably can successfully do.
And “predicting bear fat (or some kind of rendered animal fat) with honey and salt will be a popular treat”—the thing we couldn’t have done a-priori—is analogous to “predicting solar-to-electricity generator panels will be a popular fixture on many planets” (since the details probably will turn out to have some unpredictable twists), and also to “predicting that making humans satisfied with outcomes will be an optimization target for AIs in the production environment as a result of their training”.
I think this analogy is probably right, but the sense in which it’s right seems sufficiently non-obvious/detailed/finicky that I don’t think we can expect most people to get it?
Plus IMO it further undermines the pedagogical value of this example to observe that a drinkable form of ice cream (shakes) is also popular, plus there’s gelato / frozen yogurt / soft serve, and then thick sweet yogurts and popsicles… it’s a pretty continuous treat-fitness landscape.
I do think Eliezer is importantly right that the exact peak market-winning point in this landscape, would be hard to predict a-priori. But is the hardness also explained by the peak being dependent on chaotic historical/cultural forces?
And that’s why I personally don’t bring up the bear fat thing in my AI danger explanations.
Seems like the rapid-fire nature of an InkHaven writing sprint is a poor fit for a public post under a personally-charged summary bullet like “Oliver puts personal conflict ahead of shared goals”.
High-quality discourse means making an effort to give people the benefit of the doubt when making claims about their character. It’s worth taking time to carefully follow our rationalist norms of epistemic rigor, productive discourse, and personal charity.I’d expect a high-evidence post about a very non-consensus topic like this to start out in a more norm-calibrated and self-aware epistemic tone, e.g. “I have concerns about Oliver’s decisionmaking as leader of Lightcone based on a pattern of incidents I’ve witnessed in his personal conflicts (detailed below)”.
Maybe Lightcone Infrastructure can just allow earmarking donations for LessWrong, if enough people care about that criticism.
Thanks for that. Should be fixed now.