I had a vaguely favorable reaction to this post when it was first posted.
When I wrote my recent post on corrigibility, I grew increasingly concerned about the possible conflicts between goals learned during pretraining and goals that are introduced later. That caused me to remember this post, and decide it felt more important now than it did before.
I’ll estimate a 1 in 5000 chance that the general ideas in this post turn out to be necessary for humans to flourish.
I had a vaguely favorable reaction to this post when it was first posted.
When I wrote my recent post on corrigibility, I grew increasingly concerned about the possible conflicts between goals learned during pretraining and goals that are introduced later. That caused me to remember this post, and decide it felt more important now than it did before.
I’ll estimate a 1 in 5000 chance that the general ideas in this post turn out to be necessary for humans to flourish.