To answer your question, the AI alignment problem is the problem of ensuring that the first artificial general intelligence smart enough to take over the world—that is, the Singularity—leaves at least one human being alive. No one knows how to solve it, and it’s likely only rationalists could.
Anyway, welcome, but be warned, this community is full of egotists who use the orthogonality thesis to avoid having to have coherent moral principles. (Source: an angry vegan who doesn’t understand why AI rights get more attention on this site than animal rights despite the fact that animals are basically equivalent to humans with brain damage in terms of mind structure, and AIs are aliens from another universe.)
Okay, I just did a deep-dive on the AI alignment problem and the Singularity on Wikipedia, and it will take me a while to digest all of that. My first impression is that it seems like an outlandish thing to worry about, but I am going to think about it more because I can easily imagine the situation reversed.
Plug in the numbers for current computing speeds, the current doubling time, and an estimate for the raw processing power of the human brain, and the numbers match in: 2021.
GPT-3 has some tens to hundreds of billion parameters and the human brain has 86 billion neurons, and I know it’s hand-waving because model parameters aren’t equivalent to human neurons, but—not bad! On the other hand, we’re seeing now what this numerical correspondence translates to in real life, and it’s interestingly different than what we I had imagined. AI is passing the Turing test, but the Turing test no longer feels like a hard line in the sand; it doesn’t seem to be testing what it was intended to test.
No one knows how to solve it, and it’s likely only rationalists could.
Understanding what, exactly, human values are would be a first step toward expressing it in AI. I hadn’t expected meta-ethics to get so applied.
...
You know what’s really odd? The word “singularity” appears only 33 times in the Sequences, mostly as “when I attended the Singularity Summit, someone said...” and such, without explanation of what it was. Most of the references were in the autobiographical section, which I didn’t read as deeply as the rest.
Figuring out what human values actually are is a pretty important part of the project. Though, we’d still have to figure out how to align it to them. Still, there is no end of use for applied meta-ethics here. You might also want to look into the Shard Theory subcommunity here - @TurnTrout and others are working on getting an understanding of how human values arise in the first place as “shards” of a much simpler optimization process in the human brain.
problem of ensuring that the first artificial general intelligence
Transitive misalignment (successors/descendants of first AGIs being misaligned at some point) is exactly as deadly as direct misalignment (in physical time there isn’t even much distance between these, the singularity is fast). So not only must the first AGIs be aligned, they additionally need to be in a situation where they don’t build misaligned AGIs as soon as they are able. And Moloch doesn’t care about your substrate, by default it’s going to be a problem for AGIs as much as it currently is for humanity.
You’re correct, but since I define “aligned” as “tending to do what is actually best according to humanity’s value system”, and given that it would be harmful for them to take such a risk, a totally aligned AGI would not, in fact, take that risk lol. So although your addition is important to note, there’s a sense in which it is redundant.
Both direct and transitive alignment are valuable concepts. Especially with LLM AGIs, which I think are the only feasible directly aligned AGI we are likely to build, but which I suspect won’t be transitively aligned by default.
Since transitive alignment varies among humans (different humans have different inclinations towards building AGIs of uncertain alignment, given a capability to do that), it might be valuable to align LLM personalities to become people who are less likely to fail transitive alignment.
To answer your question, the AI alignment problem is the problem of ensuring that the first artificial general intelligence smart enough to take over the world—that is, the Singularity—leaves at least one human being alive. No one knows how to solve it, and it’s likely only rationalists could.
Anyway, welcome, but be warned, this community is full of egotists who use the orthogonality thesis to avoid having to have coherent moral principles. (Source: an angry vegan who doesn’t understand why AI rights get more attention on this site than animal rights despite the fact that animals are basically equivalent to humans with brain damage in terms of mind structure, and AIs are aliens from another universe.)
Okay, I just did a deep-dive on the AI alignment problem and the Singularity on Wikipedia, and it will take me a while to digest all of that. My first impression is that it seems like an outlandish thing to worry about, but I am going to think about it more because I can easily imagine the situation reversed.
Among the things I came across was that Eliezer was writing about this in 1996, and predicted
GPT-3 has some tens to hundreds of billion parameters and the human brain has 86 billion neurons, and I know it’s hand-waving because model parameters aren’t equivalent to human neurons, but—not bad! On the other hand, we’re seeing now what this numerical correspondence translates to in real life, and it’s interestingly different than what
weI had imagined. AI is passing the Turing test, but the Turing test no longer feels like a hard line in the sand; it doesn’t seem to be testing what it was intended to test.Understanding what, exactly, human values are would be a first step toward expressing it in AI. I hadn’t expected meta-ethics to get so applied.
...
You know what’s really odd? The word “singularity” appears only 33 times in the Sequences, mostly as “when I attended the Singularity Summit, someone said...” and such, without explanation of what it was. Most of the references were in the autobiographical section, which I didn’t read as deeply as the rest.
Figuring out what human values actually are is a pretty important part of the project. Though, we’d still have to figure out how to align it to them. Still, there is no end of use for applied meta-ethics here. You might also want to look into the Shard Theory subcommunity here - @TurnTrout and others are working on getting an understanding of how human values arise in the first place as “shards” of a much simpler optimization process in the human brain.
Transitive misalignment (successors/descendants of first AGIs being misaligned at some point) is exactly as deadly as direct misalignment (in physical time there isn’t even much distance between these, the singularity is fast). So not only must the first AGIs be aligned, they additionally need to be in a situation where they don’t build misaligned AGIs as soon as they are able. And Moloch doesn’t care about your substrate, by default it’s going to be a problem for AGIs as much as it currently is for humanity.
You’re correct, but since I define “aligned” as “tending to do what is actually best according to humanity’s value system”, and given that it would be harmful for them to take such a risk, a totally aligned AGI would not, in fact, take that risk lol. So although your addition is important to note, there’s a sense in which it is redundant.
Both direct and transitive alignment are valuable concepts. Especially with LLM AGIs, which I think are the only feasible directly aligned AGI we are likely to build, but which I suspect won’t be transitively aligned by default.
Since transitive alignment varies among humans (different humans have different inclinations towards building AGIs of uncertain alignment, given a capability to do that), it might be valuable to align LLM personalities to become people who are less likely to fail transitive alignment.