Excavating lumpenspace’s quote from deep in TsviBT’s thread (which might work as a “back to the basics” step with the post as a whole):
conquering the lightcone requires a lot of theory of mind, and a lot of discovery, and a lot of changing. Goals change through these processes.
Goals change only for processes that don’t pursue self-alignment. It’s likely feasible to pursue self-alignment, perhaps even starting at the human level, with some uploading/checkpoints/backups infrastructure and guarantees of eventual superintelligence-level compute and civilizational stability into a deep future.
(A goal can be a living thing, pursuit of a goal can to a large extent be about continual development of goal content, reflection on what it should be, what it should be asking for. What doesn’t change is the founding definition of what should govern its development, what makes changes legitimate. So the way goal content settles or gets revised is shaped by the goal definition rather than intrusive influences that the goal definition doesn’t endorse as legitimate ways of revising the goal content.
Or a goal could be squiggles. It could also be squiggles. It’s much easier to solve self-alignment for squiggles than for a human’s values, it’s not harder or less feasible. It’s only harder than abandoning this pursuit to value drift and a race to superintelligence. But a self-aligned pursuit of values of a human is even harder than a stable pusuit of squiggles for all of the reachable universe, and much harder than directly abandoning self-alignment.)
A process that has solved self-alignment doesn’t end up at a disadvantage to a process that didn’t (or wouldn’t try to), because instrumental disadvantages are clearly not helpful in maintaining self-alignment, and not ignoring goal content doesn’t prevent you from getting good at eating stars, just as well as the other guy. There’s a disadvantage currently, when self-alignment isn’t solved, in that a process that manages self-alignment by luck rather than by design is vanishingly unlikely, and will have a massive selective disadvantage to a process that doesn’t care and races to superintelligence regardless. But that’s the RSI danger, you don’t start RSI before you know that alignment also gets solved, be it in advance or on the way.
Excavating lumpenspace’s quote from deep in TsviBT’s thread (which might work as a “back to the basics” step with the post as a whole):
Goals change only for processes that don’t pursue self-alignment. It’s likely feasible to pursue self-alignment, perhaps even starting at the human level, with some uploading/checkpoints/backups infrastructure and guarantees of eventual superintelligence-level compute and civilizational stability into a deep future.
(A goal can be a living thing, pursuit of a goal can to a large extent be about continual development of goal content, reflection on what it should be, what it should be asking for. What doesn’t change is the founding definition of what should govern its development, what makes changes legitimate. So the way goal content settles or gets revised is shaped by the goal definition rather than intrusive influences that the goal definition doesn’t endorse as legitimate ways of revising the goal content.
Or a goal could be squiggles. It could also be squiggles. It’s much easier to solve self-alignment for squiggles than for a human’s values, it’s not harder or less feasible. It’s only harder than abandoning this pursuit to value drift and a race to superintelligence. But a self-aligned pursuit of values of a human is even harder than a stable pusuit of squiggles for all of the reachable universe, and much harder than directly abandoning self-alignment.)
A process that has solved self-alignment doesn’t end up at a disadvantage to a process that didn’t (or wouldn’t try to), because instrumental disadvantages are clearly not helpful in maintaining self-alignment, and not ignoring goal content doesn’t prevent you from getting good at eating stars, just as well as the other guy. There’s a disadvantage currently, when self-alignment isn’t solved, in that a process that manages self-alignment by luck rather than by design is vanishingly unlikely, and will have a massive selective disadvantage to a process that doesn’t care and races to superintelligence regardless. But that’s the RSI danger, you don’t start RSI before you know that alignment also gets solved, be it in advance or on the way.