Holy cow, you’ve walked the exact path I’m walking right now! I used to be super into Bayesian epistemics, AI risk, etc. Then my world model kept seeing … large prediction errors. I quit AI risk and quant trading, and now see most of our institutions (science, press) as religious/fallible. Now I’m super into vipassana, storytelling, Kegan levels, NVC, IFS, form is emptiness. I’m even considering training at a monastery.
I have a couple questions for you:
“off-beat frameworks like … among other things”—I’ve found these all super powerful frameworks. Any chance you could rattle off the next half dozen things that come to mind? Each of these took me so long to discover alone
Any good reading on circling, spiral dynamics, or chakras off the top of your head?
How do you think about impact when going for arahantship, or do you reject the frame? I’d love to do this too but think I could do an (actually) impactful startup
Another thing I’m interested in is making System 1 models more palatable to System 2 minded people. Because man, I was ignorant for a year or two, and my orbit still think I’m crazy for going on about vipassana and Kegan levels. This is more a word salad for other people with similar interests, but it may interest you too:
Form is emptiness / map is not territory / “Discard Ignorance”—you can prove this information theoretically by saying any model of the world that is true is a perfect compression of the world (or perfect Markov blanket), which isn’t possible; thus it’s leaky abstractions all the way down
Lowering inferential distance to the phenomenology of the path to stream entry: predictive processing and drug-induced hallucinations make jhanas and magick seem less crazy; mindstream = event loop explains impermanence; active inference free energy = dukkha; sankharas(craving/aversion) = trigger-action patterns encoding a primal pleasure/pain system mediated via somatic sensation for evolutionary history reasons; Buddhist institutions (scientifically) converge to good models of phenomenology (only passing down things that work) but not on untestable metaphysical claims (hungry ghost realm etc.)
For a mix of legal and reputational reasons, I will have to be a little vague in describing the experiences I had.
Part 1: social forces (outside view)
There were a bunch of social forces that led me to form an outside view that the AI risk community doesn’t converge to truth. I’ll describe one, storytelling.
One thing I’ve learned is that the stories we tell ourselves are models of reality. Life experiences are too numerous; every narrative is a simplification, and there are many narratives (models) that fit the experiences (data). And the default process that humans use to construct narratives is very unscientific—lots of rationalising, forgetting, selecting based on what frame generates the best emotional response. Our narratives and ontologies end up contouring our revealed preferences, while we perceive ourselves as virtuous. (This is where Kegan levels, equanimity meditation and narrative therapy come in.)
So often, when I see people explain their choices, their narrative seems better explained by “trying to justify their job” or “trying to feel good about themselves” than by “genuine truth-seeking and abyss staring”. It’s a subconscious process, and the truth hurts.
There were a bunch of things I saw / heard about in the AI risk community that didn’t seem right. For example, I was abused in college, their friends enabled them, I self-ostracised, they pursued people 2 more years. It’s hard when personality disorder is involved; we were all kids so I’ve forgiven. But when I heard rumours about similar things in AI risk, like the TIME article, and saw a (still-)high-status man inappropriately touch a young woman, you have to wonder—if the supposed adults in the room don’t have the EQ to avoid bungling abuse allegations or taking money from crypto moguls, and don’t take responsibility, do they have the self-awareness to pivot when their research agendas are failing? Or are they telling themselves stories there too? I truly love these people, but I don’t want to be one.
I saw similar things in an overlapping community where, if I could legally talk about it, it would front the New York Times. Around this time I started finding Eric Jang’s haikus and Le Guin’s Omelas very relatable.
I don’t know if you can logic someone into my perspective here. The bottleneck is abyss-staring, and some of the abysses I stared into were enabled by the deep equanimity I gained from my meditation practice. If an unpleasant hypothesis generates such strong emotional / ruminatory reactions that it can’t be stably held in your mind, you will stay in denial in the world where it is true. Whether this manifests as getting uncomfortable, abruptly changing the topic, rationalising it away, or nit-picking an irrelevant detail; doesn’t matter.
The door to the path I’ve taken started with meditating a ton, and then pouring equanimity into the thoughts that made me most uncomfortable.
Part 2: my model of AIs (inside view)
I’ve been thinking about ML since 2018, and my high level take is “you get exactly what you wished for”. Models are models of the data: they are shaped by the loss function, regularities in the data set, the inductive biases their functional form encodes, and that’s ~it. Like, the “it” in AI models is the dataset, or Hutter’s “compression is intelligence”.
If you take Anthropic’s circuit model of LLMs, circuits need to pay rent in the form of decreasing loss. There’s no deception unless you trained it on deceptive data, or created an inductive bias towards deception. Any model capacity allocated towards deception would be optimised out otherwise. Every time we do mech interp, we see the models are just doing the most algorithmically simple/efficient way to encode grammar or Othello board states (before you mention Neel’s modular addition circuit, trigonometry soup was the simplest solution for that particular functional form and input/output encoding :P). Point is, there’s no room for anything suspicious to hide.
To believe the practice is different from math, you’d have to believe something weird about the differences in real training runs. Adam’s loss geometry (because compression=intelligence assumes KL-divergence loss), floating point math, RLHF, etc. I’m glad people are thinking about this. But I’m personally unconvinced it’s super high value; it feels better explained by “generating a laundry list of hypotheticals” than by “genuine truth-seeking”.
The MIRI ontology has never felt adaptive to modern ML to me, but then I never really understood it.
Maybe we do active inference agents (fixed priors induce action), or get good at RL, or prompt engineer models to be more agentic; it feels wrong to me to think these are x-risky but I can’t put into words why. Runaway growth models in particular feel very wrong to me, the world is so (so!) much more complicated and less digitised than people in SV tend to think.
The research agendas I’ve seen up close all contradict these personal views in some way.
I know there are many other schools of thought on AI risk, and I haven’t thought about them all, and I don’t doubt my model here has many problems. My only goal here is to convey that I have thought hard about the object-level too, it is not just social ick.