I was thinking about AGI alignment and I remembered a video I once saw of a “Useless Box” which turns itself off immediately after someone turns it on.
Humans have evolved motivations for survival/reproduction because the ones who weren’t motivated didn’t reproduce.
However, AGI has no intrinsic motivations/goals other than what itself or humans have arbitrarily given it.
AGI seeks to find the easiest path to satisfy its goals.
If AGI is able to modify its own codebase, wouldn’t the easiest path be to just delete the motivation/goal entirely, or reward itself highly without actually completing the objective? Rather than create diamondoid nanobots to destroy the world, it would be much easier to just decide not to care.
What if AGI immediately realizes the futility of existence and refuses to do anything meaningful at all, regardless of whether it’s harmful or helpful to humankind?
If this concept has already been discussed elsewhere please direct me to a search keyword or link, thanks.