After watching the first video, the question is, will it ever make any progress, or is it going to be endlessly compiling more information about the deadliest weapons in human history? When will it be able to reason that enough information on that is enough, and be ready to decide to go to the next logical step of obtaining/using those weapons? Also, I find it funny how it seems vaguely aware that posting its intentions to Twitter might bring unwanted attention, but for some reason incorrectly models humans in such a way as to think that the followers that it will attract to its agenda will outweigh the negative attention that it will receive. Also, kind of funny that it runs into so much trouble trying to get the censored vanilla GPT-3.5 sub-agents to help it look up weapon information.
I think there are two important points in watching it run.
One is that it is stupid. Now. But progress marches on. Both the foundation LLMs and the algorithms making them into recursive agents will get better. Probably pretty quickly.
Two is that providing access only to values-aligned models could make it harder to get malicious goals to work. But people are already releasing open-source unaligned models. Maybe we should not do that for too long as they get stronger.
Third of my two points is that it is incredibly creepy to watch something thinking about how to kill you. This is going to shift public opinion. We need to figure out the consequences of that shift.
After watching the first video, the question is, will it ever make any progress, or is it going to be endlessly compiling more information about the deadliest weapons in human history? When will it be able to reason that enough information on that is enough, and be ready to decide to go to the next logical step of obtaining/using those weapons? Also, I find it funny how it seems vaguely aware that posting its intentions to Twitter might bring unwanted attention, but for some reason incorrectly models humans in such a way as to think that the followers that it will attract to its agenda will outweigh the negative attention that it will receive. Also, kind of funny that it runs into so much trouble trying to get the censored vanilla GPT-3.5 sub-agents to help it look up weapon information.
I think there are two important points in watching it run.
One is that it is stupid. Now. But progress marches on. Both the foundation LLMs and the algorithms making them into recursive agents will get better. Probably pretty quickly.
Two is that providing access only to values-aligned models could make it harder to get malicious goals to work. But people are already releasing open-source unaligned models. Maybe we should not do that for too long as they get stronger.
Third of my two points is that it is incredibly creepy to watch something thinking about how to kill you. This is going to shift public opinion. We need to figure out the consequences of that shift.