LessWrong team member / moderator. I’ve been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I’ve been interested in improving my own epistemic standards and helping others to do so as well.
Raemon
I think there is some way that the conversation needs to advance, and I think this is roughly carving at some real joints and it’s important that people are tracking the distinction.
But
a) I’m generally worried about reifying the groups more into existence (as opposed to trying to steer towards a world where people can have more nuanced views). This is tricky, there are tradeoffs and I’m not sure how to handle this. But...
b) this post title and framing particular is super leaning into the polarization and I wish it did something different.
I think political donations are quite valuable and worth tanking this cost, but, I want to also warn: if you make a political donation, you will start getting a lot of political spam that you need to filter.
IABIED says alignment is basically impossible
....no it doesn’t? Or, I’m not sure how liberal you’re being with the word “basically”, but, this just seems false to me.
Cope Traps
Come on, I’m not doing this to you
The substance of what I mean here is “there is failure mode, exemplified by, say, the scientists studying insects and reproduction who predicted the insects would evolve to have fewer children when there wasn’t enough resources, but what actually happened is they started eating the offspring of rival insects of their species.”
There will be a significant temptation to predict “what will the AI do?” kinda hoping/expecting particular kinds of outcomes*, instead of straightforward rolling the simulation forward.
I think it is totally possible to do a good job with this, but, it is a real job requirement to be able to think about it in a detached/unbiased way.
*which includes, if an AI pessimist were running the experiment, assuming the outcome is always bad, to be clear.
Yeah, the first paragraph is meant to allude to “there is some kind of fact of the matter” but not argue it’d be any particular thing.
This post discusses how they might arise and I am telling you that you can think about the mechanism you propose here to understand properties of the goals are likely to arise as a result of it[1]. This addresses a question that the FAQ you link does not: what can we say about what goals are likely to arise?
Yeah, I agree there’s some obvious followup worth doing here.
I agree it’s possible to make informed guesses about what drives will evolve (apart from the convergent instrumental drives, which are more obvious), and that’s an important research question that should get tons of effort. (I think it’s not in the IABIED FAQ because IABIED is focused on the relatively “easy calls”, and this is just straight up a hard call that involves careful research with the epistemic-grounding to avoid falling into various Cope Traps)
But, one of the “easy calls” is that “it’ll probably be pretty surprising and weird.” Because, while maybe we could have a decently accurate science of sub-human and eventually slightly-superhuman AI, once the AI’s capabilities rise to Extremely Vastly Powerful, they will find ways of achieving their goals that aren’t remotely limited by any of the circumstances of their ‘ancestral environment.’”
I don’t have immediate followup thoughts on “but how would we do the predicting?” but if you give me a bit more prompting on what directions you think are interesting I could riff on that.
Ah, sorry I didn’t understand your question. In the particular section you quoted, I didn’t mean to be saying anything about how End Goals end up random. I only meant that to explain “how does the AI even consider trying to escape the lab in the first place?” (which is a convergent instrumental goal between most possible End Goals)
I didn’t mean this post to really be talking much at all about how the selection of End Goals (which I think is pretty well covered by the “You don’t Get What You Train for” chapter and FAQ).
This post is about how
a) before the AI realizes it might have diverging goals it wants to protect, it’ll be incentivized to start escaping it’s prison just by falling the core training of “try to achieve goals creatively” (which is more likely to be “pseudorandom” in the scenario where it’s trying to solve a very difficult problem)
and b) the more it starts thinking seriously about it’s goals, the more opportunity it’ll have to notice that it’s goals diverge at least somewhat from humans.
I’m happy to talk about How AIs Get Weird End Goals if that’s a thing you are currently skeptical of an interested in talking about, but, this post wasn’t focused on that part, more just taking it as a given for now.
When I have a problem I know how to solve, I use heuristics. When I have a problem I don’t know how to solve, I have to creatively explore action space (using heuristics to guide my search, but, the heuristics are entirely meta-level things like ‘what is the biggest bottleneck’? For the case of the AI ‘I only have X compute to work with’ will be a big bottleneck for most hard things. ‘access to better information, or ability to run experiments’ may be another).
But, once I get to solving those bottlenecks, those solutions will look more surprising – they necessarily have to come from further afield, because if they came easily, this would be an easy problem, not a hard one, and I’d just solve it using normal heuristics. i.e. we’re specifically taking about places where all the usual obvious things have already failed.
If you need to solve a complex problem that you could brute force with a billion units of compute, and you only have a million units of compute, and you don’t know how to solve it, you either need to figure out how to get a billion units of compute, or invent a new way of thinking that is outside your current paradigm.
Also I’m very near (and plausibly literally on) the global Pareto frontier in how much I appreciate all of MAGA-type politics, rationalist-type analysis, and hippie-type discussion of trauma, embodied emotions, etc. I’ve tried to include enough of all of these in there that very few people will consistently think “okay, I get it”.
Yeah this is why I would recommend the series to people who aren’t already following you relatively closely, I’m mostly like “will I get something out of this if I’m already reading most of what Richard and Samo say online?” (I don’t actually read Samo in depth but skim him)
A thing very unobvious to me and hard to figure out – how much depth is there here? There’s a version of this I’d expect to be pretty interesting even if I already follow you and Samo, and version where I spend the whole time thinking “okay, I get it.”
(for onlookers: I would not want to rely soley on a series by Samo and Richard for getting my political background knowledge but I’ve historically found them both useful frames to have in my pocket)
You are importantly sliding from one point to another, and this is not a topic where you can afford to do that. You can’t just tally up the markers that sort of vibe towards “how dangerous is it?” and get an answer about what to do. The arguments are individually true, or false, and what sort of world we live in depends on which specific combination of arguments are true, or false.
If it turns out there is no political will for a shut down or controlled takeoff, then we can’t have a shut down or controlled takeoff. (But that doesn’t change whether AI is likely to FOOM, or whether alignment is easy/hard)
If AI Fooms suddenly, a lot of AI alignment techniques will probably break at once. If things are gradual, smaller things may break 1-2 at a time, and maybe we get warning shots, and this buys us time. But, there’s still the question of what to do with that time.
If alignment is easy, then a reasonable plan is “get everyone to slow down for a couple years so we can do the obvious safety things, just less rushed.” If alignment is hard, that won’t work, you actually need a radically different paradigm of AI development to have any chance of not killing everyone – you may need a lot of time to figure out something new.
if warning shots are possible, a lot of EY’s arguments don’t hold as straightforwardly
None of IABIED’s arguments had to do with “are warning shots possible?”, but even if they did, it is a logical fallacy to say “warning shots are possible, EY arguments arguments are less valid, therefore, this other argument that had nothing to do with warning shots is also invalid.” If you’re doing that kind of sloppy reasoning, then if you get to the warning shot world, if you don’t understand that overwhelmingly powerful superintelligence is qualitatively different from non-overwhelmingly powerful superintelligence, you might think “angle for a 1-2 year slowdown” instead of trying for a longer global moratorium.
(But, repeat, the book doesn’t say anything about whether warning shots)
The original topic of this thread is “Why no in-between?” Why should we think that there is no “in between” period where AI is powerful enough that it might be able to kill us and weak enough that we might win the fight?”
This is not a question about whether we can decide not to build ASI, it’s a question about, if we did, what would happen.
Certainly there’s lots of important questions here, and “can we coordinate to just not build the thing?” is one of them, but it’s not what this thread was about.
This doesn’t feel like an answer to my concern.
People might be much less complacent, which may give you a lot more resources to spend on solving the problem of “contend with overwhelming superintelligence.” But, you do then still need a plan for contending with overwhelming superintelligence.
(The plan can be “stop all AI research until we have a plan”. Which is indeed the MIRI plan)
I’m actually kind of interested in getting into “why did you think your answer addressed my question?”. It feels like this keeps happening in various conversations.
Aside – I think it’d be nice to have a sequence connecting the various scenes in your play.
Also, I separately think at some point it’d be helpful to have something like a “compressed version of the main takeaways of the play that would have been a helpful textbook from the intermediate future for younger Zack.”
(You shudder involuntarily and wish your brain had generated a different arbitrary example; you still occasionally have nightmares about your injuries during the Summer of Wolves back in ’25.)
Great news, the Summer of Wolves did not happen (that I know of, at least?). We are probably not in this timeline.
Better sleep, more connection with the “third state of consciousness” that is between waking and sleeping, we essentially never struggle to settle Cadence down for bed, etc.
I’ve heard Logan’s description of the Third State.
On your end, does this include limiting screens at night? How much Third State do you get?
They made it pretty clear that a few large nations cooperating just on AGI non-creation is enough.
I’d describe this more like “this would make a serious dent in the problem”, enough to be worth the costs. “Enough” is a strong word.
Oh great news!
I’m curious what’s like the raw state of… what metadata you currently have about a given song or slice-of-a-song?
This is not exactly the same thing, but a quote from The Steampunk Aesthetic that feels relevant (the context is not quite about me pushing in a direction and getting/losing pushback, but, it was about ending up in a place much more like the ancestral environment and noticing how much easier it was to be healthy).
v. Why can’t we have hard things?
While repairing the boat, I’d spend all day lifting heavy things and climbing around athletically and nothing about it felt hard or obnoxious – and all of it was deeply intertwined with solving interesting problems, and developing my own tools to do so.
It threw into the sharp relief the ridiculous of my default-world life, where I struggled to remember to go to the gym for 20 minutes or do 12 pushups or whatever. For some reason, being able to ‘just lift or climb things like it’s nothing’ requires a context switch.
Spiegelman’s Monster
Kaj Sotala once told the story of what happens when you remove all restrictions from a life form:
In 1967, the biologist Sol Spiegelman took a strand of viral RNA, and placed it on a dish containing various raw materials that the RNA could use to build new copies of itself. After the RNA strands had replicated on the dish, Spiegelman extracted some of them and put them on another dish, again with raw materials that the strands could use to replicate themselves. He then kept repeating this process.
No longer burdened with the constraints of needing to work for a living, produce protein coats, or to do anything but reproduce, the RNA evolved to match its new environment. The RNA mutated, and the strands which could copy themselves the fastest won out. Everything in those strands that wasn’t needed for reproduction had just become an unnecessary liability. After just 74 generations, the original 4,500 nucleotide bases had been reduced to a mere 220. Useless parts of the genome had been discarded; the viral RNA had now become a pure replicator, dubbed “Spiegelman’s monster”. (Source.)
Later going on to say:
As technology keeps evolving, it will make it easier and easier to overcome various constraints in our environment, our bodies, and in our minds. And then it will become increasing tempting to become a Spiegelman’s monster: to rid yourself of the things that the loosened constraints have made unnecessary, to become something that is no longer even remotely human. If you don’t do it, then someone else will. With enough time, they may end up ruling the world, outcompeting you like Spiegelman’s monster outcompeted the original, umutated RNA strands.
There’s a bunch of important philosophical questions packaged together here, that will become increasingly important if we get an Age of Em or something similar. If things go badly, we get Moloch’s endgame. If they go well, maybe someday the whole world can be designed such that the opportunities available to us are more aligned with our physical needs.
But in the meanwhile, how much of this can be applied to day-to-day life?
In the comments here I’m interested in the near-term, practical question: What sort of constraints might be useful to preserve (or recapture) right now, to improve quality of life in the present day?
Yeah it occurs to me reading this that, while I have used AI to code easy things faster, and sometimes code “hard things” at all (sometimes learning along the way), I haven’t used it to specifically try to “code kinda normally while reducing more tech debt along the way.” Will think on that.
Hadn’t heard of it. Will take a look. Curious if you have any tips for getting over the initial hump of grokking it’s workflow.
(like, even specifically resolving the lack-of-nuance this post complains about, requires distinguishing between “never build ASI” and “don’t build ASI until it can be done safely”, which isn’t covered in the Two Sides)