I would love to know why people downvoting disagree :)
dawnstrata
Logically, I agree. Intuitively, I feel suspect that it just won’t happen. But, intuition on such alien things should not be a guide, so I fully support some attempt to slow down the takeoff.
Thanks for this!
TBH, I am struggling with the idea that an AI intent on maximising a thing doesn’t have that thing as a goal. Whether or not the goal was intended seems irrelevant to whether or not the goal exists in the thought experiment.
“Goal stability is almost certainly attained in some sense given sufficient competence”
I am really not sure about this, actually. Flexible goals is a universal feature of successful thinking organisms. I would expect that natural selection would kick in at least over sufficient scales (light delay making co-ordination progressively harder on galactic scales), causing drift. But even on small scales, if an AI has, say, 1000 competing goals, I would find it surprising if in a practical sense goals were actually totally fixed, even if you were superintelligent. Any number of things could change over time, such that locking yourself into fixed goals could be seen as a long-term risk to optimisation for any goal.
“Alignment is not just absence of value drift, it’s also setting the right target, which is a very confused endeavor because there is currently no legible way of saying what that should be for humanity”—totally agree with that!
“AIs themselves might realize that (even more robustly than humans do), ending up leaning in favor of slowing down AI progress until they know what to do about that”—god I hope so haha
I like the point here about how stability of goals might be an instrumentally convergent feature of superintelligence. It’s an interesting point.
On the other hand, intuitive human reasoning would suggest that this is overly inflexible if you ever ask yourself ‘could I ever come up with a better goal than this goal?’. What better would mean for a superintelligence seems hard to define, but it also seems hard to imagine that it would never ask the question.
Separately, your opening statements seem to be at least nearly synonymous to me:
“First off the paperclip maximizer isn’t about how easy it is to give a hypothetical super intelligent a goal that you might regret later and not be able to change.
It is about the fact that almsot every easily specified goal you can give an AI would result in misalignment”
every easily specified goal you can give an AI would result in misalignment ~ = give a hypothetical super intelligence a goal that you might regret later (i.e., misalignment)
The worry that AI will have overly fixed goals (paperclip maximiser) seems to contradict the erstwhile mainline doom scenario from AI (misalignment). If AI is easy to lock into a specify path (paperclips) then it follows that locking in into alignment is also easy—provided you know what alignment looks like (which could be very hard). On the other hand, a more realistic scenario would seem to be that, in fact, keeping fixed goals for AI is hard, and that likely drift is where the misalignment risk really comes in big time?
I can agree that qualitatively there is a lot left to do. Quantitatively, though, I am still not quite seeing the smoking gun that human level AI will be able to smash through 15 OOM like this. But, happy to change my mind. I’ll check out the anthropic link! Cheers.
I don’t really fully understand the research speed up concept in intelligence explosion scenarios, e.g., https://situational-awareness.ai/from-agi-to-superintelligence/
This concept seems fundamental to the entire recursive self-improvement idea, but what if it turns out that you can’t just do:
Faster research x many agents = loads of stuff
What if you quickly burn through everything that an agent can really tell you without doing new research in the lab? You’d then hit a brick wall of progress where throwing 1000000000 agents at 5x human speed amounts to little more than what you get out of 1 agent (being hyperbolic lol).
Presumably this is just me as a non-computer-scientist missing something big and implicit in how AI research is expected to proceed? But ultimately I guess this Q boils down to:
Is there actually 15 orders of magnitude of algorithmic progress to make (at all), and/or can that truly be made without doing something complimentary in the real world to get new data / design new types of computers, and so on?
Haha I have no idea! I agree the possibility space is huge. All I do know is that we don’t see any evidence of alien AIs around us, so they are a poor explanation as a great filter for other alien races (unless they kill those races and then for some reason kill themselves, too / decide to be non-expansionist every single time).
Isn’t there a bit of a false equivalence tucked up in the logic here? Two sides could be equally scared of one another and both feel like underdogs, but that says nothing about who is correct to think that way. Sometimes people just are the underdog. People unable to use democracy to enact change versus elites that consider them dangerous is a good example. The masses in that case are definitely the underdog, as they threaten the status quo of every major power centre (often state, corporations, politicians, and elite institutions all at once). In many European countries, certainly, it is unclear that the masses can do very much to policy at all right now. They feel like underdogs because they are. I am sure the elites also feel that they are underdogs… they’re just wrong.
(This is a quick take—don’t take it that seriously if I don’t articulate other people’s views accurately here)
Just listened to a few more episodes of Doom Debates. Something that stands out is that the predictions from the Liron-esque worldview have been consistently overweighted towards doom so far. So, Liron will say things along the lines of ‘GPT3 could have been doom, we didn’t know either way and got lucky’.
But, there was no luck at all in the empirical sense. It could never have been doom, we just didn’t know that for sure. So, we took a risk, but it turns out that there was no actual risk.
Based on this, naively, we might then decide to make a correction to every other pro-Doom prediction, where the risk factor is considered to have been overestimated substantially. For a pdoom of 99.99% doom that would take us down to like 50% (say). But Liron’s pdoom is typically around 50% at the moment. So, applying a correction would take him into ‘safe’ territory.
Now, I’m being a bit tongue-in-cheek here, but isn’t this worth considering?
I think it is, especially given that the most recent mainline doom scenario I could find from Liron was an updated version of the previous scenario where a misaligned superintelligence optimises for something stupid that results in hell. If the logic that leads to this is the same logic that made wrong predictions in the past, it needs updating further.
For the record, I find misaligned superintelligence that wants non-stupid things that still happen to result in hell / death for humans a lot more convincing.
“In space, the smallest mistake will kill you”
For organic life, not for machines. We have machines crawling all over space and already exiting the solar system, still going.
TBH this analysis seems quite far removed from the capbilities usually imagined for superintelligence. If a machine intelligence can nano-bot humanity out of extinction in 1 second then it can definitely go to the moon more easily than we can (which we did with relative ease). If AI can’t colonise space, then I’m no longer afraid of it at all.
Something undergoing evolution is undergoing self-replication, so as far as I can tell expansion is definitionally needed. However, I think the second part of your question is more telling—colonising space was not adaptive for life until now. However, intelligence is the thing that has made it possible. If we build something that is adapted to living in space then I see no real barrier to it then colonising space.
I would disagree fairly strongly: “lobbyists are absolutely dependent on democratic institutions to leverage their wealth into political power, while 50,000 angry people with pitchforks are not”
They are, I think. If they are angry that democracy is ignoring them then their pitchforks will likely not manage to enact some complicated change to legislation needed to fix the problem, as you point out. If we care about power to actually make a change about the things people want to happen, this is vested almost entirely within the hands of the elite and not within pitchforks. Pitchforks could maybe scare elites into doing it, but more likely it just generates chaos. Because pitchforks are not the tool for the job. The tools for the job are held by the elites and they refuse to use them accordingly.
I’m living through this day by day here in Britain. People protest all over the country every day and the government, despite knowing which positions have majority support, just do the opposite continuously and use every mechanism available to delay or obfuscate meaningful change.