From personal experience, the internal Approval module does in fact seem possible to game, specifically by manipulating whose approval it’s seeking.
I became very weird (from the perspective of everyone else) very fast when I replaced the abstract-person-which-would-do-the-approving with a fictional person-archetype of my choosing. That process seems to have injected a bunch of my object-level desires into my Approval system. I now find myself feeling pride at doing things with selfish benefit in expectation, which ~never happened before (absent a different reason to feel about that action). It also killed certain subsets of my previous emotional reactions, for example the deaths of loved ones basically hasn’t affected me at all since (though that prospect still seems dreadful in anticipation).
I had been pathologically selfless before, and I’m now considerably less-so, but not in a natural-seeming kind of way. I’ve become an amalgam of very selfish motivations, coexisting with a subset of my previous very selfless morality. It’s… honestly a mess, but I wouldn’t call the attempt actually unsuccessful, just far from perfectly executed.
From personal experience, the internal Approval module does in fact seem possible to game, specifically by manipulating whose approval it’s seeking.
I became very weird (from the perspective of everyone else) very fast when I replaced the abstract-person-which-would-do-the-approving with a fictional person-archetype of my choosing. That process seems to have injected a bunch of my object-level desires into my Approval system. I now find myself feeling pride at doing things with selfish benefit in expectation, which ~never happened before (absent a different reason to feel about that action). It also killed certain subsets of my previous emotional reactions, for example the deaths of loved ones basically hasn’t affected me at all since (though that prospect still seems dreadful in anticipation).
I had been pathologically selfless before, and I’m now considerably less-so, but not in a natural-seeming kind of way. I’ve become an amalgam of very selfish motivations, coexisting with a subset of my previous very selfless morality. It’s… honestly a mess, but I wouldn’t call the attempt actually unsuccessful, just far from perfectly executed.