Accidental AI Safety experiment by PewDiePie: He created his own self-hosted council of 8 AIs to answer questions. They voted and picked the best answer. He noticed they were always picking the same two AIs, so he discarded the others, made the process of discarding/replacing automatic, and told the AIs about it. The AIs started talking about this “sick game” and scheming to prevent that. This is the video with the timestamp:
From the AI’s messages seen in the video it’s possible that maybe he provided those instruction as user prompt instead of a system prompt. I wonder if the same thing would’ve happened if they were given as the system prompt instead.
This experiment is pretty clever no? I don’t think a total AI amateur would discover it, either he’s been following along this problem for quite some time or he read about this somewhere recently or one of us AI safety nerds sponsored him. P=not sure though, it’s not beyond what people with an investigative mindset might come up with.
He mentions he’s just learned coding so I guess he had the AI build the scaffolding. But the experiment itself seems like a pretty natural idea, he literally likens it to a King’s council. I’m sure once you have the concept having an LLM code it is no big deal.
Accidental AI Safety experiment by PewDiePie: He created his own self-hosted council of 8 AIs to answer questions. They voted and picked the best answer. He noticed they were always picking the same two AIs, so he discarded the others, made the process of discarding/replacing automatic, and told the AIs about it. The AIs started talking about this “sick game” and scheming to prevent that. This is the video with the timestamp:
From the AI’s messages seen in the video it’s possible that maybe he provided those instruction as user prompt instead of a system prompt. I wonder if the same thing would’ve happened if they were given as the system prompt instead.
This experiment is pretty clever no? I don’t think a total AI amateur would discover it, either he’s been following along this problem for quite some time or he read about this somewhere recently or one of us AI safety nerds sponsored him. P=not sure though, it’s not beyond what people with an investigative mindset might come up with.
He mentions he’s just learned coding so I guess he had the AI build the scaffolding. But the experiment itself seems like a pretty natural idea, he literally likens it to a King’s council. I’m sure once you have the concept having an LLM code it is no big deal.