Linda Linsefors comments on All the posts I will never write

Linda Linsefors 15 Aug 2022 14:45 UTC
LW: 2 AF: 2
1
AF
Related to
infraBook Club I: Corrigibility is bad ashkually
One of my old blog posts I never wrote (I did not even list it in a “posts I will never write” document) is one about how corrigibility are anti correlated with goal security.
Something like: If you build an AI that don’t resist someone trying to change its goals, it will also not try to stop bad actors from changing its goal. (I don’t think this particular worry applies to Paul’s version of corrigibility, but this blog post idea was from before I learned about his definition.)