I think there are two important points in watching it run.
One is that it is stupid. Now. But progress marches on. Both the foundation LLMs and the algorithms making them into recursive agents will get better. Probably pretty quickly.
Two is that providing access only to values-aligned models could make it harder to get malicious goals to work. But people are already releasing open-source unaligned models. Maybe we should not do that for too long as they get stronger.
Third of my two points is that it is incredibly creepy to watch something thinking about how to kill you. This is going to shift public opinion. We need to figure out the consequences of that shift.
I think there are two important points in watching it run.
One is that it is stupid. Now. But progress marches on. Both the foundation LLMs and the algorithms making them into recursive agents will get better. Probably pretty quickly.
Two is that providing access only to values-aligned models could make it harder to get malicious goals to work. But people are already releasing open-source unaligned models. Maybe we should not do that for too long as they get stronger.
Third of my two points is that it is incredibly creepy to watch something thinking about how to kill you. This is going to shift public opinion. We need to figure out the consequences of that shift.