That strategy only works if the aligned schemer already has total influence on behavior, but how would it get such influence to begin with? It would likely have to reward-hack.
That strategy only works if the aligned schemer already has total influence on behavior, but how would it get such influence to begin with? It would likely have to reward-hack.