Ebenezer Dukakis comments on Legible vs. Illegible AI Safety Problems

Ebenezer Dukakis 5 Nov 2025 8:50 UTC
1 point
0
Thanks for the reply. I’m not a philosopher, but it seems to me that most of these problems could be addressed after an AGI is built, if the AGI is corrigible. Which problems can you make the strongest case for as problems which we can’t put off this way?
- Wei Dai 5 Nov 2025 8:53 UTC
  3 points
  0
  Parent
  https://www.lesswrong.com/posts/M9iHzo2oFRKvdtRrM/reminder-morality-is-unsolved?commentId=bSoqdYNRGhqDLxpvM
  - Ebenezer Dukakis 5 Nov 2025 9:37 UTC
    1 point
    0
    Parent
    Again, thanks for the reply.
    
    Building a corrigible AGI has a lot of advantages. But one disadvantage is the “morality is scary” problem you mention in the linked comment. If there is a way to correct the AGI, who gets to decide when and how to correct it? Even if we get the right answers to all of the philosophical questions you’re talking about, and successfully program them into the AGI, the philosophical “unwashed masses” you fear could exert tremendous public pressure to use the corrigibility functionality and change those right answers into wrong ones.
    
    Since corrigibility is so advantageous (including its ability to let us put off all of your tricky philosophical problems), it seems to me that we should think about the “morality is scary” problem so we can address what appears to be corrigibility’s only major downside. I suspect the “morality is scary” problem is more tractable than you assume. Here is one idea (I did a rot13 so people can think independently before reading my idea): Oevat rirelbar va gur jbeyq hc gb n uvtuyl qrirybcrq fgnaqneq bs yvivat. Qrirybc n grfg juvpu zrnfherf cuvybfbcuvpny pbzcrgrapr. Inyvqngr gur grfg ol rafhevat gung vg pbeerpgyl enax-beqref cuvybfbcuref ol pbzcrgrapr nppbeqvat gb 3eq-cnegl nffrffzragf. Pbaqhpg n tybony gnyrag frnepu sbe cuvybfbcuvpny gnyrag. Pbafgehpg na vibel gbjre sbe gur jvaaref bs gur gnyrag frnepu gb fghql cuvybfbcul naq cbaqre cuvybfbcuvpny dhrfgvbaf juvyr vfbyngrq sebz choyvp cerffher.