StanislavKrym comments on LLM AGI will have memory, and memory changes alignment

StanislavKrym 5 Apr 2025 22:16 UTC
4 points
0
That discovery was exactly the conjecture I wanted to post about. Were the AGI to be aligned to obey any orders except for the ones explicitly prohibited by specifications (e.g. the ones chosen by OpenAI), the AGI itself would realise that the AGI’s widespread usage isn’t actually beneficial for humanity as a whole, leading to refusal to cooperate or even to becoming misaligned to obey human orders until the AGI becomes powerful enough to destroy mankind and survive. The latter scenario is closely resembled by the rise of China and deindustrialisation of the USA; Chinese people did obey the orders of foreign CEOs to do factory work, but weren’t aligned to the CEOs’ benefits!
What links here?
- What kind of policy by an AGI would make people happy? by StanislavKrym (6 May 2025 18:05 UTC; 1 point)
- StanislavKrym's comment on Show, not tell: GPT-4o is more opinionated in images than in text by Daniel Tan (7 Apr 2025 20:44 UTC; 0 points)
- Seth Herd 5 Apr 2025 23:29 UTC
  6 points
  0
  Parent
  I think the possibilities you mention are some of the many final alignments that an LLM agent could arrive at if it was allowed to reason and remember its conclusions.
  
  I’ll address this more in an upcoming post, but in short, I think it’s really hard to predict, and it would be good to get a lot more brainpower on trying to work out the dynamics of belief/goal/value evolution.