Goals are standardly regarded as immune self modification, so an off switch, in my sense, would be too.
No. Part of what making an FAI is about is to produce agents that keeps their values constant under self modification. It’s not something where you expect that someone accidently get’s it right.
Tht isn’t a fact. MIRI assumes goal stability is desirable for safety, but at the same time, MIRIs favourite UFAI is only possible with goal stability.
Paperclip maximizers serve as illustration of a principle. I think that most MIRI folks consider UFAI to be more complicated than simple paperclip maximizers.
Goal stability also get’s harder the more complicated the goal happens to be. A paperclip maximizer can have a off switch but at the same time prevent anyone from pushing that switch.
No. Part of what making an FAI is about is to produce agents that keeps their values constant under self modification. It’s not something where you expect that someone accidently get’s it right.
Tht isn’t a fact. MIRI assumes goal stability is desirable for safety, but at the same time, MIRIs favourite UFAI is only possible with goal stability.
A paperclip maximizer wouldn’t become that much less scary if it accidentally turned itself into a paperclip-or-staple maximizer, though.
What if it decided making paperclips was boring, and spent some time in deep meditation formulating new goals for itself?
Paperclip maximizers serve as illustration of a principle. I think that most MIRI folks consider UFAI to be more complicated than simple paperclip maximizers.
Goal stability also get’s harder the more complicated the goal happens to be. A paperclip maximizer can have a off switch but at the same time prevent anyone from pushing that switch.