ChristianKl comments on I think I’m just confused. Once a model exists, how do you “red-team” it to see whether it’s safe. Isn’t it already dangerous?

ChristianKl 18 Nov 2023 23:05 UTC
2 points
0
I think you get the point but say openAI “trains” GPT-5 and it turns out to be so dangerous that it can persuade anybody of anything and it wants to destroy the world.

Most scenarios that are dangerous are not as straightforward where GPT-5 has a clear goal to destroy the world. If that’s what the model does it’s also fairly straightforward to just delete the model.