I think we’ll get some more scares from systems like autoGPT. Watching an AI think to itself, in English, is going to be powerful. And when someone hooks one up to an unturned model and asks it to think about whether and how to take over the world, I think we’ll get another media event. For good reasons.
I think actually making such systems, while the core LLM is still too dumb to actually succeed at taking over the world, might be important.
The other factor is that nobody else has AFAIK done something like ChaosGPT, where they give an agent a misaligned goal and then publicize a transcript of it thinking and trying to accomplish that. I’m disappointed but not surprised, because there’s a good chance of being criminally liable for doing it, particularly if the agent has a hope in hell of actually doing something harmful—which is exactly when it would be more impactful.
I totally agree that it might be good to have such a fire alarm as soon as possible, and looking at how fast people make GPT-4 more and more powerful makes me think that this is only a matter of time.
I think we’ll get some more scares from systems like autoGPT. Watching an AI think to itself, in English, is going to be powerful. And when someone hooks one up to an unturned model and asks it to think about whether and how to take over the world, I think we’ll get another media event. For good reasons.
I think actually making such systems, while the core LLM is still too dumb to actually succeed at taking over the world, might be important.
Has chain-of-thought had the effect you anticipated?
Good question.
Decidedly not.
One big factor is that scaffolded agents just didn’t work nearly as well or easily as I thought they would. Here’s an analysis of why:
Have agentized LLMs changed the alignment landscape? I’m not sure.
The other factor is that nobody else has AFAIK done something like ChaosGPT, where they give an agent a misaligned goal and then publicize a transcript of it thinking and trying to accomplish that. I’m disappointed but not surprised, because there’s a good chance of being criminally liable for doing it, particularly if the agent has a hope in hell of actually doing something harmful—which is exactly when it would be more impactful.
I totally agree that it might be good to have such a fire alarm as soon as possible, and looking at how fast people make GPT-4 more and more powerful makes me think that this is only a matter of time.