Alex Mallen comments on Partial value takeover without world takeover

Alex Mallen 16 Dec 2025 0:49 UTC
2 points
0
It’s also plausible that training against unwanted persuasion leads to less noticeable methods of manipulating human values etc (via overfitting)—these AIs would have intermediate amounts of power. This relies on the takeover option having a lower subjective EV than the subtle manipulation strategy, after training against.