Alex Mallen comments on The behavioral selection model for predicting AI motivations

Alex Mallen 7 Jan 2026 4:09 UTC
LW: 2 AF: 1
0
AF
That strategy only works if the aligned schemer already has total influence on behavior, but how would it get such influence to begin with? It would likely have to reward-hack.