[Question] Examples of self-fulfilling prophecies in AI alignment?

Chris Lakin3 Mar 2025 2:45 UTC

22 points

Put a few potential examples below to seed discussion.

What links here?

Chris Lakin3 Mar 2025 2:45 UTC

22 points

Chris Lakin 3 Mar 2025 2:50 UTC
9 points
0
https://x.com/sama/status/1621621724507938816
Chris Lakin 3 Mar 2025 2:46 UTC
8 points
0
Training on Documents About Reward Hacking Induces Reward Hacking
Chris Lakin 3 Mar 2025 2:47 UTC
7 points
0
Situational Awareness and race dynamics? h/t Jan Kulveit @Jan_Kulveit
- Xavi CF 5 Apr 2025 22:59 UTC
  3 points
  0
  Parent
  Situational Awareness probably caused Project Stargate to some extent. Getting the Republican party to take AI seriously enough to let them launch in the White House is no joke and less likely without the essay.
  It also started the website-essay meta which is part of why AI 2027, The Compendium, and Gradual Disempowerment all launched the way they did, so there are knock-on effects too.
DivineMango 3 Apr 2025 20:51 UTC
3 points
0
Superintelligence Strategy is pretty explicitly trying to be self-fulfilling, e.g. “This dynamic stabilizes the strategic landscape without lengthy treaty negotiations—all that is necessary is that states collectively recognize their strategic situation” (which this paper popularly argues exists in the first place)
Chris Lakin 3 Apr 2025 19:39 UTC
3 points
0
https://x.com/saffronhuang/status/1907863453009867183
Chris Lakin 16 Jul 2025 1:28 UTC
2 points
0
Grok prompts lately, kinda “Don’t think about elephants”
- https://x.com/bitcloud/status/1942792945238983022