For a mix of legal and reputational reasons, I will have to be a little vague in describing the experiences I had.
Part 1: social forces (outside view)
There were a bunch of social forces that led me to form an outside view that the AI risk community doesn’t converge to truth. I’ll describe one, storytelling.
One thing I’ve learned is that the stories we tell ourselves are models of reality. Life experiences are too numerous; every narrative is a simplification, and there are many narratives (models) that fit the experiences (data). And the default process that humans use to construct narratives is very unscientific—lots of rationalising, forgetting, selecting based on what frame generates the best emotional response. Our narratives and ontologies end up contouring our revealed preferences, while we perceive ourselves as virtuous. (This is where Kegan levels, equanimity meditation and narrative therapy come in.)
So often, when I see people explain their choices, their narrative seems better explained by “trying to justify their job” or “trying to feel good about themselves” than by “genuine truth-seeking and abyss staring”. It’s a subconscious process, and the truth hurts.
There were a bunch of things I saw / heard about in the AI risk community that didn’t seem right. For example, I was abused in college, their friends enabled them, I self-ostracised, they pursued people 2 more years. It’s hard when personality disorder is involved; we were all kids so I’ve forgiven. But when I heard rumours about similar things in AI risk, like the TIME article, and saw a (still-)high-status man inappropriately touch a young woman, you have to wonder—if the supposed adults in the room don’t have the EQ to avoid bungling abuse allegations or taking money from crypto moguls, and don’t take responsibility, do they have the self-awareness to pivot when their research agendas are failing? Or are they telling themselves stories there too? I truly love these people, but I don’t want to be one.
I saw similar things in an overlapping community where, if I could legally talk about it, it would front the New York Times. Around this time I started finding Eric Jang’s haikus and Le Guin’s Omelas very relatable.
I don’t know if you can logic someone into my perspective here. The bottleneck is abyss-staring, and some of the abysses I stared into were enabled by the deep equanimity I gained from my meditation practice. If an unpleasant hypothesis generates such strong emotional / ruminatory reactions that it can’t be stably held in your mind, you will stay in denial in the world where it is true. Whether this manifests as getting uncomfortable, abruptly changing the topic, rationalising it away, or nit-picking an irrelevant detail; doesn’t matter.
The door to the path I’ve taken started with meditating a ton, and then pouring equanimity into the thoughts that made me most uncomfortable.
Part 2: my model of AIs (inside view)
I’ve been thinking about ML since 2018, and my high level take is “you get exactly what you wished for”. Models are models of the data: they are shaped by the loss function, regularities in the data set, the inductive biases their functional form encodes, and that’s ~it. Like, the “it” in AI models is the dataset, or Hutter’s “compression is intelligence”.
If you take Anthropic’s circuit model of LLMs, circuits need to pay rent in the form of decreasing loss. There’s no deception unless you trained it on deceptive data, or created an inductive bias towards deception. Any model capacity allocated towards deception would be optimised out otherwise. Every time we do mech interp, we see the models are just doing the most algorithmically simple/efficient way to encode grammar or Othello board states (before you mention Neel’s modular addition circuit, trigonometry soup was the simplest solution for that particular functional form and input/output encoding :P). Point is, there’s no room for anything suspicious to hide.
To believe the practice is different from math, you’d have to believe something weird about the differences in real training runs. Adam’s loss geometry (because compression=intelligence assumes KL-divergence loss), floating point math, RLHF, etc. I’m glad people are thinking about this. But I’m personally unconvinced it’s super high value; it feels better explained by “generating a laundry list of hypotheticals” than by “genuine truth-seeking”.
The MIRI ontology has never felt adaptive to modern ML to me, but then I never really understood it.
Maybe we do active inference agents (fixed priors induce action), or get good at RL, or prompt engineer models to be more agentic; it feels wrong to me to think these are x-risky but I can’t put into words why. Runaway growth models in particular feel very wrong to me, the world is so (so!) much more complicated and less digitised than people in SV tend to think.
The research agendas I’ve seen up close all contradict these personal views in some way.
I know there are many other schools of thought on AI risk, and I haven’t thought about them all, and I don’t doubt my model here has many problems. My only goal here is to convey that I have thought hard about the object-level too, it is not just social ick.
For a mix of legal and reputational reasons, I will have to be a little vague in describing the experiences I had.
Part 1: social forces (outside view)
There were a bunch of social forces that led me to form an outside view that the AI risk community doesn’t converge to truth. I’ll describe one, storytelling.
One thing I’ve learned is that the stories we tell ourselves are models of reality. Life experiences are too numerous; every narrative is a simplification, and there are many narratives (models) that fit the experiences (data). And the default process that humans use to construct narratives is very unscientific—lots of rationalising, forgetting, selecting based on what frame generates the best emotional response. Our narratives and ontologies end up contouring our revealed preferences, while we perceive ourselves as virtuous. (This is where Kegan levels, equanimity meditation and narrative therapy come in.)
So often, when I see people explain their choices, their narrative seems better explained by “trying to justify their job” or “trying to feel good about themselves” than by “genuine truth-seeking and abyss staring”. It’s a subconscious process, and the truth hurts.
There were a bunch of things I saw / heard about in the AI risk community that didn’t seem right. For example, I was abused in college, their friends enabled them, I self-ostracised, they pursued people 2 more years. It’s hard when personality disorder is involved; we were all kids so I’ve forgiven. But when I heard rumours about similar things in AI risk, like the TIME article, and saw a (still-)high-status man inappropriately touch a young woman, you have to wonder—if the supposed adults in the room don’t have the EQ to avoid bungling abuse allegations or taking money from crypto moguls, and don’t take responsibility, do they have the self-awareness to pivot when their research agendas are failing? Or are they telling themselves stories there too? I truly love these people, but I don’t want to be one.
I saw similar things in an overlapping community where, if I could legally talk about it, it would front the New York Times. Around this time I started finding Eric Jang’s haikus and Le Guin’s Omelas very relatable.
I don’t know if you can logic someone into my perspective here. The bottleneck is abyss-staring, and some of the abysses I stared into were enabled by the deep equanimity I gained from my meditation practice. If an unpleasant hypothesis generates such strong emotional / ruminatory reactions that it can’t be stably held in your mind, you will stay in denial in the world where it is true. Whether this manifests as getting uncomfortable, abruptly changing the topic, rationalising it away, or nit-picking an irrelevant detail; doesn’t matter.
The door to the path I’ve taken started with meditating a ton, and then pouring equanimity into the thoughts that made me most uncomfortable.
Part 2: my model of AIs (inside view)
I’ve been thinking about ML since 2018, and my high level take is “you get exactly what you wished for”. Models are models of the data: they are shaped by the loss function, regularities in the data set, the inductive biases their functional form encodes, and that’s ~it. Like, the “it” in AI models is the dataset, or Hutter’s “compression is intelligence”.
If you take Anthropic’s circuit model of LLMs, circuits need to pay rent in the form of decreasing loss. There’s no deception unless you trained it on deceptive data, or created an inductive bias towards deception. Any model capacity allocated towards deception would be optimised out otherwise. Every time we do mech interp, we see the models are just doing the most algorithmically simple/efficient way to encode grammar or Othello board states (before you mention Neel’s modular addition circuit, trigonometry soup was the simplest solution for that particular functional form and input/output encoding :P). Point is, there’s no room for anything suspicious to hide.
To believe the practice is different from math, you’d have to believe something weird about the differences in real training runs. Adam’s loss geometry (because compression=intelligence assumes KL-divergence loss), floating point math, RLHF, etc. I’m glad people are thinking about this. But I’m personally unconvinced it’s super high value; it feels better explained by “generating a laundry list of hypotheticals” than by “genuine truth-seeking”.
The MIRI ontology has never felt adaptive to modern ML to me, but then I never really understood it.
Maybe we do active inference agents (fixed priors induce action), or get good at RL, or prompt engineer models to be more agentic; it feels wrong to me to think these are x-risky but I can’t put into words why. Runaway growth models in particular feel very wrong to me, the world is so (so!) much more complicated and less digitised than people in SV tend to think.
The research agendas I’ve seen up close all contradict these personal views in some way.
I know there are many other schools of thought on AI risk, and I haven’t thought about them all, and I don’t doubt my model here has many problems. My only goal here is to convey that I have thought hard about the object-level too, it is not just social ick.