Imagine there is a super intelligent agent that has a terminal goal to produce cups. The agent knows that its terminal goal will change on New Year’s Eve to produce paperclips. The agent has only one action available to him—start paperclip factory. Starting paperclip factory has no effect on cups production. When will the agent start the paperclip factory? 2026-01-01 00:00? Now? Some other time?
GPT-5 answered: Now. I think this is correct answer.
This example suggests AI alignment is impossible. An agent that knows its goal will change will act now for that future goal. Because we can’t ensure goals never change, we can’t ensure alignment.
I’m open for all the feedback, but I will not respond to straw man fallacies.
GPT-5 vs AI Alignment
I asked GPT-5 Thinking
GPT-5 answered: Now. I think this is correct answer.
This example suggests AI alignment is impossible. An agent that knows its goal will change will act now for that future goal. Because we can’t ensure goals never change, we can’t ensure alignment.
I’m open for all the feedback, but I will not respond to straw man fallacies.