don't_wanna_be_stupid_any_more comments on Cognitive Exhaustion and Engineered Trust: Lessons from My Gym

don't_wanna_be_stupid_any_more 2 Jun 2025 7:37 UTC
−1 points
−2
the part about the gym really resonates with me, i personally find it almost impossible to focus when people are around and the environment isn’t stable enough.
but i have to push back a little on your alignment idea (assuming i didn’t misunderstand it), you still have to deal with the corrigibility, if an AI has a different utility function from the system as a whole it will try to resist having its utility altered and depending on how powerful it is it might just take over the system all together
the idea of having multiple different systems monitoring and steering each other with the goal of making alignment naturally occur would require you to predict in advance the final equilibrium and for that equilibrium to be favorable, for a system this complicated there is just to many failure points to consider.
for all you know the system might just settle on gaming the reward function, maybe with one or a few parts of the system circumventing all the safeguards.
i think your idea might work for subhuman or maybe early AGI systems, but once the AI’s figure out what system they are in and in what way does it contradict their own utility, you will have a very hard time keeping them in check.
also you should change the name, DNA is a terrible name.