martinkunev comments on Corrigibility, Much more detail than anyone wants to Read

martinkunev 19 Sep 2025 11:13 UTC
1 point
0
This:
Might agent $b$ rewrite agent $a$ ‘s brain to make agent $a$ better satisfy agent $b$ ’s utility function? Most forms of wire-heading inherently limit the ability of agents to affect the future
and this
We have not proved that agent $b$ does not try to affect agent $a$ ‘s utility function (in fact, I expect in many cases agent $b$ does try to influence agent $a$ ’s utility function).
appear to be in conflict. Are you trying to say that depending on the circumstances b may try to influence a’s utility function or avoid doing so?