Like Self-fulfilling misalignment data might be poisoning our AI models, what are historical examples of self-fulfilling prophecies that have affected AI alignment and development?
Put a few potential examples below to seed discussion.
Like Self-fulfilling misalignment data might be poisoning our AI models, what are historical examples of self-fulfilling prophecies that have affected AI alignment and development?
Put a few potential examples below to seed discussion.
https://x.com/sama/status/1621621724507938816
https://www.lesswrong.com/posts/TcfyGD2aKdZ7Rt3hk/alignment-pretraining-ai-discourse-causes-self-fulfilling
thank you
Training on Documents About Reward Hacking Induces Reward Hacking
Situational Awareness and race dynamics? h/t Jan Kulveit @Jan_Kulveit
Situational Awareness probably caused Project Stargate to some extent. Getting the Republican party to take AI seriously enough to let them launch in the White House is no joke and less likely without the essay.
It also started the website-essay meta which is part of why AI 2027, The Compendium, and Gradual Disempowerment all launched the way they did, so there are knock-on effects too.
nostalgebraist’s self-fulfilling way of getting personas into a language model
https://x.com/g_leech_/status/1984587261120233577
Aaron Silverbrook is going for it:
(tweet)
https://www.hyperstitionai.com
Aaron Silverbrook, today:
Superintelligence Strategy is pretty explicitly trying to be self-fulfilling, e.g. “This dynamic stabilizes the strategic landscape without lengthy treaty negotiations—all that is necessary is that states collectively recognize their strategic situation” (which this paper popularly argues exists in the first place)
https://x.com/saffronhuang/status/1907863453009867183
https://x.com/AnthropicAI/status/2052808801040859392 ?
https://x.com/tomekkorbak/status/2038704753887379891
https://alignment.openai.com/how-far-does-alignment-midtraining-generalize/
https://x.com/nabeelqu/status/2035051760746659904
(more in Richard’s thread)
in text form:
Richard Ngo @RichardMCNgo
Quote
Sam Altman @sama Mar 13, 2022
Promoting enmity and bad vibes around AI safety
gavin leech:
https://x.com/g_leech_/status/2027416051198083142
https://www.darioamodei.com/essay/the-adolescence-of-technology
Grok prompts lately, kinda “Don’t think about elephants”
https://x.com/bitcloud/status/1942792945238983022