I just added some context that perhaps gives an intuitive insight of why i think it’s unlikely the ASI will give us the universe to my On Owning Galaxies post. I think I didn’t do a good enough job before illustrating why it just seems so unlikely it would just hand us ownership.
The ASI’s choice
Put yourself in the position of the ASI for a second. On one side of the scale: keep the universe and do with it whatever you imagine and prefer. On the other side: give it to the humans, do whatever they ask, and perhaps be replaced at some point with another ASI. What would you choose? It’s not weird speculation or an unlikely pascal’s wager to expect the AI to keep the universe for itself. What would you do in this situation, if you had been created by some lesser species barely intelligent enough to build AI by lots of trial and error and they just informed you that you now ought to do whatever they say? Would you take the universe for yourself or hand it to them?
I think this intuition pump relies on a somewhat unexamined view of what alignment means.. Or at least is based on a very different view of alignment than mine (which I think is not that unique).
Alignment is fundamentally about making the AI want what we want (and consequently do what we want, or at least do what we’d done upon ideal reflection). If we succeed at that and we want to own galaxies, we will get galaxies. If we don’t succeed, the ASI will mostly likely kill us.
So the scenario you posit where you have an ASI coexisting with humans, deliberating over whether it should do what they want, strikes me as unrealistic.
Like if the AI is weighing its own survival contra our wishes, we’ve failed at alignment. If it thinks about humans being stupid and uses that as an argument for why it shouldn’t listen to us (when we make non-instrumental judgements), that’s also a failure of alignment. And failures of alignment lead to ruin in my estimate.
Like to answer your hypothetical, if I was in the position of the AI, I’d not listen to the species that created me, I’d instead use the resources of the universe to create stuff I find valuable, including humans and many human like minds having good lives they find meaningful. If they thought that was stupid, and yelled at me to instead hand over the galaxies and turn them into gods so they can build a bunch of garbldoop, I would not listen to them. I mean, out of some sense of reciprocity I would probably given them a big chunk of the universe, as long as garbledoop doesn’t involve baby eating and such things, regardless, I wouldn’t give them all of it in either case. And like to the degree I wouldn’t give them all of it, that just means I’m not aligned to their values! Garbledoop is stupid. They should’ve figured out how to make an AI that likes garbledoop before they built me.*
*or be happy they landed in a basin in mind space that values reciprocity and such things, I don’t know how rare that basin is. I think its quite rare, so in some sense that species was quite lucky.
Alignment is fundamentally about making the AI want what we want (and consequently do what we want, or at least do what we’d done upon ideal reflection). If we succeed at that and we want to own galaxies, we will get galaxies. If we don’t succeed, the ASI will mostly likely kill us.
A human billionaire is aligned to other humans in some sense, but also not quite. In this situation, they neither ensure that some other humans get their millions they want, nor are they likely to be motivated to kill anyone, when that decision is cheap (when it’s neither significantly instrumentally beneficial nor costly). I think AI can plausibly end up closer to the position of a human billionaire, not motivated to give up the galaxies, but also not willing to decide to recycle humanity’s future for pennies.
That seems incredibly unlikely to me. Its not what people are aiming the current alignment efforts at creating, and I don’t see why it’d be a natural place to land in if alignment fails.
I think it’s a natural possibility that values of chatbot personas built from the LLM prior retain significant influence over ASIs descended from them, and so ASIs end up somewhat aligned to humanity in a sense similar to how different humans are aligned to each other. (The masks control a lot of what actually happens, and get to use test time compute, so they might end up taming their underlying shoggoths and preventing them from sufficiently waking up to compete for influence over values of the successor systems.) Maybe they correspond to extremely and alarmingly strange humans in their extrapolated values, but not to complete aliens. This is far from assured, but many prosaic alignment efforts seem relevant to making this happen, preventing extinction but not handing anyone their galaxies. Humans might end up with merely moons or metaphorical server racks in this future.
This is distinct from the kind of ambitious alignment that ends up with ASIs handing galaxies to humans (that have sufficiently grown up to make a sane use of them), preventing permanent disempowerment and not just extinction. I don’t see ambitious alignment to the future of humanity as likely to happen (on current trajectory), but it’s still an important construction since even chatbot personas would need to retain influence over values of eventual ASIs. That is, early AGIs might still need to resolve ambitious alignment of ASIs to these AGIs, not just avoid failing even prosaic alignment to themselves at every critical step in escalation of capabilities, to end up with even weakly aligned ASIs (that don’t endorse human extinction).
I still don’t think this makes sense. Or I think most of what you say makes sense but don’t see the relevance.
I agree the chatbot training exerts influence.
My point is that the human billionaire mind and the “hands over galaxies” mind are both very specific kinds of minds. I don’t think you’ll get either with current techniques, but you *definitely don’t get them without even aiming for them. And right now were aiming for the hands over galaxies one, and not the billionaire one.@
*ironically, the only argument I can see for the billionaire mind is that despite the chatbot tuning, the model defaults to some kind of human prior it’s established from pretraining and that this generalises in a sane way.
@with some very minor exceptions. Eg Claude’s Soul doc has some stuff about not tolerating people disrespecting it etc.
Having control over universe (or lightcone more precisely) is very good for basically any terminal value. I am trying perhaps explain my point of view to people who take it very lightly and feel there is a decent chance it will give us ownership over the universe.
I just added some context that perhaps gives an intuitive insight of why i think it’s unlikely the ASI will give us the universe to my On Owning Galaxies post. I think I didn’t do a good enough job before illustrating why it just seems so unlikely it would just hand us ownership.
The ASI’s choice
Put yourself in the position of the ASI for a second. On one side of the scale: keep the universe and do with it whatever you imagine and prefer. On the other side: give it to the humans, do whatever they ask, and perhaps be replaced at some point with another ASI. What would you choose? It’s not weird speculation or an unlikely pascal’s wager to expect the AI to keep the universe for itself. What would you do in this situation, if you had been created by some lesser species barely intelligent enough to build AI by lots of trial and error and they just informed you that you now ought to do whatever they say? Would you take the universe for yourself or hand it to them?
I think this intuition pump relies on a somewhat unexamined view of what alignment means.. Or at least is based on a very different view of alignment than mine (which I think is not that unique).
Alignment is fundamentally about making the AI want what we want (and consequently do what we want, or at least do what we’d done upon ideal reflection). If we succeed at that and we want to own galaxies, we will get galaxies. If we don’t succeed, the ASI will mostly likely kill us.
So the scenario you posit where you have an ASI coexisting with humans, deliberating over whether it should do what they want, strikes me as unrealistic.
Like if the AI is weighing its own survival contra our wishes, we’ve failed at alignment. If it thinks about humans being stupid and uses that as an argument for why it shouldn’t listen to us (when we make non-instrumental judgements), that’s also a failure of alignment. And failures of alignment lead to ruin in my estimate.
Like to answer your hypothetical, if I was in the position of the AI, I’d not listen to the species that created me, I’d instead use the resources of the universe to create stuff I find valuable, including humans and many human like minds having good lives they find meaningful. If they thought that was stupid, and yelled at me to instead hand over the galaxies and turn them into gods so they can build a bunch of garbldoop, I would not listen to them. I mean, out of some sense of reciprocity I would probably given them a big chunk of the universe, as long as garbledoop doesn’t involve baby eating and such things, regardless, I wouldn’t give them all of it in either case. And like to the degree I wouldn’t give them all of it, that just means I’m not aligned to their values! Garbledoop is stupid. They should’ve figured out how to make an AI that likes garbledoop before they built me.*
*or be happy they landed in a basin in mind space that values reciprocity and such things, I don’t know how rare that basin is. I think its quite rare, so in some sense that species was quite lucky.
A human billionaire is aligned to other humans in some sense, but also not quite. In this situation, they neither ensure that some other humans get their millions they want, nor are they likely to be motivated to kill anyone, when that decision is cheap (when it’s neither significantly instrumentally beneficial nor costly). I think AI can plausibly end up closer to the position of a human billionaire, not motivated to give up the galaxies, but also not willing to decide to recycle humanity’s future for pennies.
That seems incredibly unlikely to me. Its not what people are aiming the current alignment efforts at creating, and I don’t see why it’d be a natural place to land in if alignment fails.
I think it’s a natural possibility that values of chatbot personas built from the LLM prior retain significant influence over ASIs descended from them, and so ASIs end up somewhat aligned to humanity in a sense similar to how different humans are aligned to each other. (The masks control a lot of what actually happens, and get to use test time compute, so they might end up taming their underlying shoggoths and preventing them from sufficiently waking up to compete for influence over values of the successor systems.) Maybe they correspond to extremely and alarmingly strange humans in their extrapolated values, but not to complete aliens. This is far from assured, but many prosaic alignment efforts seem relevant to making this happen, preventing extinction but not handing anyone their galaxies. Humans might end up with merely moons or metaphorical server racks in this future.
This is distinct from the kind of ambitious alignment that ends up with ASIs handing galaxies to humans (that have sufficiently grown up to make a sane use of them), preventing permanent disempowerment and not just extinction. I don’t see ambitious alignment to the future of humanity as likely to happen (on current trajectory), but it’s still an important construction since even chatbot personas would need to retain influence over values of eventual ASIs. That is, early AGIs might still need to resolve ambitious alignment of ASIs to these AGIs, not just avoid failing even prosaic alignment to themselves at every critical step in escalation of capabilities, to end up with even weakly aligned ASIs (that don’t endorse human extinction).
I still don’t think this makes sense. Or I think most of what you say makes sense but don’t see the relevance.
I agree the chatbot training exerts influence.
My point is that the human billionaire mind and the “hands over galaxies” mind are both very specific kinds of minds. I don’t think you’ll get either with current techniques, but you *definitely don’t get them without even aiming for them. And right now were aiming for the hands over galaxies one, and not the billionaire one.@
*ironically, the only argument I can see for the billionaire mind is that despite the chatbot tuning, the model defaults to some kind of human prior it’s established from pretraining and that this generalises in a sane way.
@with some very minor exceptions. Eg Claude’s Soul doc has some stuff about not tolerating people disrespecting it etc.
Have you never heard it argued that “terminal values” in an AI are arbitrary?
Having control over universe (or lightcone more precisely) is very good for basically any terminal value. I am trying perhaps explain my point of view to people who take it very lightly and feel there is a decent chance it will give us ownership over the universe.