When open-source language model agents were developed, the first goals given were test cases like “A holiday is coming up soon. Plan me a party for it.” Hours after, someone came up with the clever goal “Make me rich.” Less than a day after that, we finally got “Do what’s best for humanity.” About three days later, someone tried “Destroy humanity.”
None of these worked very well, because language model agents aren’t very good (yet?). But it’s a good demonstration that people will deliberately build agents and give them open-ended goals. And that some small fraction of people would tell an AI “Destroy humanity,” but that’s maybe not the central concern because the “Do what’s best for humanity” people were more numeous and had a 3-day lead.
The bigger concern is the “Plan my party” and “Make me rich” people were also numeous and also had a lead, and those might be dangerous goals to give to a monomaniacal AI.
When open-source language model agents were developed, the first goals given were test cases like “A holiday is coming up soon. Plan me a party for it.” Hours after, someone came up with the clever goal “Make me rich.” Less than a day after that, we finally got “Do what’s best for humanity.” About three days later, someone tried “Destroy humanity.”
None of these worked very well, because language model agents aren’t very good (yet?). But it’s a good demonstration that people will deliberately build agents and give them open-ended goals. And that some small fraction of people would tell an AI “Destroy humanity,” but that’s maybe not the central concern because the “Do what’s best for humanity” people were more numeous and had a 3-day lead.
The bigger concern is the “Plan my party” and “Make me rich” people were also numeous and also had a lead, and those might be dangerous goals to give to a monomaniacal AI.