I think this is a good and important post, but there was one point I felt missing from it: What if the company, bing caught in a race, not only wants to keep using their proven schemer model, but they want to continue its training to be smarter, or quickly build other smarter models with similar techniques? I think it’s likely they will want to do that, and I think most of your recommendations in the post become very dubious if the scheming AI is continuously trained to be smarter.
Do you have recommendations on what to do if the company wants to train smarter AIs once they caught a schemer? It’s fair to say that we don’t have a plan for that, and please just don’t train smarter schemers, but then I think that should appear in the “If you are continuing to deploy known scheming models, my recommendation is” list of key recommendations.
I think this is a good and important post, but there was one point I felt missing from it: What if the company, bing caught in a race, not only wants to keep using their proven schemer model, but they want to continue its training to be smarter, or quickly build other smarter models with similar techniques? I think it’s likely they will want to do that, and I think most of your recommendations in the post become very dubious if the scheming AI is continuously trained to be smarter.
Do you have recommendations on what to do if the company wants to train smarter AIs once they caught a schemer? It’s fair to say that we don’t have a plan for that, and please just don’t train smarter schemers, but then I think that should appear in the “If you are continuing to deploy known scheming models, my recommendation is” list of key recommendations.