Which kind of impossible-to-solve do you think alignment is, and why?
Do you mean that there literally isn’t any one of the countably infinite set of bit strings that could run as a program on any mathematically possible piece of computing hardware that would “count” as both superintelligent and aligned? That… just seems like a mathematically implausible prior. Even if any particular program is aligned with probability zero, there could still be infinitely many aligned superintelligences “out there” in mind design space.
Note: if you’re saying the concept of “aligned” is itself confused to the point of impossibility, well, I’d agree that I’m at least sure my current concept of alignment is that confused if I push it far enough, but it does not seem to be the case that there are no physically possible futures I could care about and consider successful outcomes for humanity, so it should be possible to repair said concept.
Do you mean there is no way to physically instantiate such a device? Like, it would require types of matter that don’t exist, or numbers of atoms so large they’d collapse into a black hole, or so much power that no Kardashev I or II civ could operate it? Again, I find that implausible on the grounds that all the humans combined are made of normal atoms, weigh on the order of a billion tons, and consume on the order of a terrawatt of chemical energy in the form of food, but I’d be interested in any discussions of this question.
Do you mean it’s just highly unlikely that humans will successfully find and implement any of the possible safe designs? Then assuming impossibility would seem to make this even more likely, self-fulfilling-prophecy style, no? Isn’t trying to fix this problem the whole point of alignment research?
Which kind of impossible-to-solve do you think alignment is, and why?
Do you mean that there literally isn’t any one of the countably infinite set of bit strings that could run as a program on any mathematically possible piece of computing hardware that would “count” as both superintelligent and aligned? That… just seems like a mathematically implausible prior. Even if any particular program is aligned with probability zero, there could still be infinitely many aligned superintelligences “out there” in mind design space.
Note: if you’re saying the concept of “aligned” is itself confused to the point of impossibility, well, I’d agree that I’m at least sure my current concept of alignment is that confused if I push it far enough, but it does not seem to be the case that there are no physically possible futures I could care about and consider successful outcomes for humanity, so it should be possible to repair said concept.
Do you mean there is no way to physically instantiate such a device? Like, it would require types of matter that don’t exist, or numbers of atoms so large they’d collapse into a black hole, or so much power that no Kardashev I or II civ could operate it? Again, I find that implausible on the grounds that all the humans combined are made of normal atoms, weigh on the order of a billion tons, and consume on the order of a terrawatt of chemical energy in the form of food, but I’d be interested in any discussions of this question.
Do you mean it’s just highly unlikely that humans will successfully find and implement any of the possible safe designs? Then assuming impossibility would seem to make this even more likely, self-fulfilling-prophecy style, no? Isn’t trying to fix this problem the whole point of alignment research?