MichaelStJules comments on When would AGIs engage in conflict?

MichaelStJules 27 Sep 2023 0:15 UTC
1 point
0
Hmm, if A is simulating B with B’s source code, couldn’t the simulated B find out it’s being simulated and lie about its decisions or hide what its actual preferences? Or would its actual preferences be derivable from its weights or code directly without simulation?