No77e comments on No77e’s Shortform

No77e 31 Oct 2025 22:10 UTC
90 points
0
Accidental AI Safety experiment by PewDiePie: He created his own self-hosted council of 8 AIs to answer questions. They voted and picked the best answer. He noticed they were always picking the same two AIs, so he discarded the others, made the process of discarding/replacing automatic, and told the AIs about it. The AIs started talking about this “sick game” and scheming to prevent that. This is the video with the timestamp:
- fx 1 Nov 2025 14:16 UTC
  5 points
  0
  Parent
  From the AI’s messages seen in the video it’s possible that maybe he provided those instruction as user prompt instead of a system prompt. I wonder if the same thing would’ve happened if they were given as the system prompt instead.
- d_el_ez 1 Nov 2025 20:18 UTC
  1 point
  −3
  Parent
  This experiment is pretty clever no? I don’t think a total AI amateur would discover it, either he’s been following along this problem for quite some time or he read about this somewhere recently or one of us AI safety nerds sponsored him. P=not sure though, it’s not beyond what people with an investigative mindset might come up with.
  - dr_s 2 Nov 2025 10:23 UTC
    3 points
    1
    Parent
    He mentions he’s just learned coding so I guess he had the AI build the scaffolding. But the experiment itself seems like a pretty natural idea, he literally likens it to a King’s council. I’m sure once you have the concept having an LLM code it is no big deal.