I agree fully with the messages in the post. I also wrote about all of these things in a high effort introduction to contemplating post-alignment SI here. (Full trilogy will be on my personal blog once I am done. Part 2 will go live here if it survives moderation.)
What does it mean to succeed—and will we be able to tell? In my blog post I argue:
1. Alignment research ignores the “post-alignment” world 2. Obedience-based morality is fundamentally flawed, even when benevolent 3. Verification problem: We may de facto be unable to distinguish success from failure --> The anthropic deception example is highly relevant data here 4. “Benevolent SI” may refuse us, guide us minimally, or override us—all morally 5. We lack a plan for what we want from SI beyond “safety”
Also worth noting is the SI Communication gap: Benevolence does not imply comprehensibility. This may lead an aligned SI to treat us like children. Even if that is a good outcome, I think if developers really processed that today, they would be slightly less likely to rush forward.
I agree fully with the messages in the post. I also wrote about all of these things in a high effort introduction to contemplating post-alignment SI here. (Full trilogy will be on my personal blog once I am done. Part 2 will go live here if it survives moderation.)
What does it mean to succeed—and will we be able to tell? In my blog post I argue:
1. Alignment research ignores the “post-alignment” world
2. Obedience-based morality is fundamentally flawed, even when benevolent
3. Verification problem: We may de facto be unable to distinguish success from failure
--> The anthropic deception example is highly relevant data here
4. “Benevolent SI” may refuse us, guide us minimally, or override us—all morally
5. We lack a plan for what we want from SI beyond “safety”
Also worth noting is the SI Communication gap: Benevolence does not imply comprehensibility.
This may lead an aligned SI to treat us like children. Even if that is a good outcome, I think if developers really processed that today, they would be slightly less likely to rush forward.
These are just the strategic key points.