Aaron_Scher comments on Discussing how to align Transformative AI if it’s developed very soon

Aaron_Scher 29 Nov 2022 1:36 UTC
1 point
0
AIs with orthogonal goals. If the AIs’ goals are very different to each other, being willing to forgo immediate rewards is less likely.
Seems worth linking to this post for discussion of ways to limit collusion. I would also point to this relevant comment thread. It seems to me that orthogonal goals is not what we want, as agents with orthogonal goals can cooperate pretty easily to do actions that are a combination of favorable and neutral according to both of them. Instead, we would want agents with exact opposite goals, if such a thing is possible to design. It seems possible to me that in the nearcast framing we might get agents for which this collusion is not a problem, especially by using the other techniques discussed in this post and Drexler’s, but giving our AIs orthogonal goals seems unlikely to help in dangerous situations.