Lucius Bushnaq

Karma: 4,530

AI notkilleveryoneism researcher, focused on interpretability.

Personal account, opinions are my own.

I have signed no contracts or agreements whose existence I cannot mention.

Lucius Bushnaq 26 Oct 2025 17:36 UTC
6 points
2
in reply to: J Bostock’s comment on: Brightline is Actually Pretty Dangerous
The WaPo article appears to refer to passenger fatalities per billion passenger miles, not total fatalities. For comparison, trains in the European Union in 2021 apparently had ca. 0.03 passenger fatalities per billion passenger miles, but almost 0.3 total fatalities per million train miles.

Lucius Bushnaq 21 Oct 2025 21:08 UTC
2 points
0
in reply to: niplav’s comment on: Humanity Learned Almost Nothing From COVID-19
Right now it reads like one example of the pledged funding being met, one example of it being only being ca. ³⁄₄ met but there’s also two years left until the original deadline, and one example of the funding never getting pledged in the first place (since congress didn’t pass it).

I agree this is a pitifully small investment. But it doesn’t seem like big bills and programs got created and then walked back. More like they just never came to be in the first place. 4.5 billion euros is a paltry sum.

I think this may be an important distinction to make, because it suggests there was perhaps never much political push to prepare for the next pandemic even at the time. Did people actually ‘memory hole’ and forget, or did they just never care in the first place?

I for one don’t recall much discussion about preparing for the next pandemic outside rationalist/EA-adjacent circles even while the Covid-19 pandemic was still in full swing.

Lucius Bushnaq 21 Oct 2025 20:54 UTC
8 points
2
on: Humanity Learned Almost Nothing From COVID-19
The Pandemic Fund got pledged $3 bio.
...
the Pandemic Fund has received $3.1 bio, with an unmet funding gap of $1 bio. as of the time of writing.
I’m confused. This makes it sound like they did get the pledged funding?

Lucius Bushnaq 18 Oct 2025 6:48 UTC
5 points
0
on: The Mom Test for AI Extinction Scenarios
For what it’s worth, my mother read If Anyone Builds It, Everyone Dies and seems to have been convinced by it. She’s probably not very representative though. She had prior exposure to AI x-risk arguments through me, is autistic, has a math PhD, and is a Gödel, Escher, Bach fan.

Lucius Bushnaq 8 Oct 2025 21:58 UTC
4 points
0
in reply to: TsviBT’s comment on: johnswentworth’s Shortform
The proposal at the end looks somewhat promising to me on a first skim. Are there known counterpoints for it?

Lucius Bushnaq 8 Oct 2025 18:06 UTC
6 points
2
in reply to: johnswentworth’s comment on: johnswentworth’s Shortform
I agree that this seems maybe useful for some things, but not for the “Which UTM?” question in the context of debates about Solomonoff induction specifically, and I think that’s the “Which UTM?” question we are actually kind of philosophically confused about. I don’t think we are philosophically confused about which UTM to use in the context of us already knowing some physics and wanting to incorporate that knowledge into the UTM pick, we’re confused about how to pick if we don’t have any information at all yet.

Lucius Bushnaq 8 Oct 2025 4:58 UTC
6 points
0
in reply to: johnswentworth’s comment on: johnswentworth’s Shortform
Attempted abstraction and generalization: If we don’t know what the ideal UTM is, we can start with some arbitrary UTM $U_{1}$ , and use it to predict the world for a while. After (we think) we’ve gotten most of our prediction mistakes out of the way, we can then look at our current posterior, and ask which other UTM $U_{2}$ might have updated to that posterior faster, using less bits of observation about (our universe/the string we’re predicting). You could think of this as a way to define what the ‘correct’ UTM is. But I don’t find that definition very satisfying, because the validity of this procedure for finding a good $U_{2}$ depends on how correct the posterior we’ve converged on with our previous, arbitrary, $U_{1}$ is. ‘The best UTM is the one that figures out the right answer the fastest’ is true, but not very useful.
Is the thermodynamics angle gaining us any more than that for defining the ‘correct’ choice of UTM?
We used some general reasoning procedures to figure out some laws of physics and stuff about our universe. Now we’re basically asking what other general reasoning procedures might figure out stuff about our universe as fast or faster, conditional on our current understanding of our universe being correct.

Lucius Bushnaq 7 Oct 2025 19:48 UTC
4 points
0
in reply to: johnswentworth’s comment on: johnswentworth’s Shortform
Why does it make Bayesian model comparison harder? Wouldn’t you get explicit predicted probabilities for the data $X$ from any two models you train this way? I guess you do need to sample from the Gaussian in $λ$ a few times for each $X$ and pass the result through the flow models, but that shouldn’t be too expensive.

Lucius Bushnaq 7 Oct 2025 18:52 UTC
4 points
0
in reply to: johnswentworth’s comment on: johnswentworth’s Shortform
Did that clarify?
Yes. Seems like a pretty strong assumption to me.
Yup, it sure does look similar. One tricky point here is that we’re trying to fit the $f$ ’s to the data, so if going that route we’d need to pick some parametetric form for $f$ .
Ah. In that case, are you sure you actually need $Z$ to do the model comparisons you want? Do you even really need to work with this specific functional form at all? As opposed to e.g. training a model $p (λ ∣ X)$ to feed its output into $m$ tiny normalizing flow models which then try to reconstruct the original input data with conditional probability distributions $q_{i} (x_{i} ∣ λ)$ ?

To sketch out a little more what I mean, $p (λ ∣ X)$ could e.g. be constructed as a parametrised function^[1] which takes in the actual samples $X$ and returns the mean of a Gaussian, which $λ$ is then sampled from in turn^[2]. The $q_{i} (x_{i} ∣ λ)$ would be constructed using normalising flow networks^[3], which take in $λ$ as well as uniform distributions over variables $z_{i}$ that have the same dimensionality as their $x_{i}$ . Since the networks are efficiently invertible, this gives you explicit representations of the conditional probabilities $q_{i} (x_{i} ∣ λ)$ , which you can then fit to the actual data using KL-divergence.
You’d get explicit representations for both $P [λ ∣ X]$ and $P [X ∣ λ]$ from this.
1. ^
  Or ensemble of functions, if you want the mean of $λ$ to be something like $\sum_{i} f_{i} (x_{i})$ specifically.
2. ^
  Using reparameterization to keep the sampling operation differentiable in the mean.
3. ^
  If the dictionary of possible values of $X$ is small, you can also just use a more conventional ml setup which explicitly outputs probabilities for every possible value of every $x_{i}$ of course.

Lucius Bushnaq 7 Oct 2025 14:30 UTC
4 points
0
in reply to: johnswentworth’s comment on: johnswentworth’s Shortform
Trick I’m currently using: we can view the sum $\sum_{x} Z_{| X} (x)$ as taking an expectation of $Z_{| X} (x)$ under a uniform distribution $Q [X]$ . Under that uniform distribution, $\sum_{i} f_{i} (X_{i})$ is a sum of independent random variables, so let’s wave our hands just a little and assume that sum is approximately normal.
Not following this part. Can you elaborate?
Some scattered thoughts:
1. Regrading convergence, to state the probably obvious, since $P [X_{i} ∣ Λ] \propto \sum_{x} e^{λ^{T} f_{i} (x_{i})}$ , $f_{i} (x_{i})$ at least has to go to zero for $x$ going to infinity.
2. In my field-theory-brained head, the analysis seems simpler to think about for continuous $x$ . So unless we’re married to $x$ being discrete, I’d switch from $\sum_{x}$ to $\int d x$ . Then you can potentially use Gaussian integral and source-term tricks with the dependency on $x$ as well. If you haven’t already, you might want to look at (quantum) field theory textbooks that describe how to calculate expectation values of observables over path integrals. This expression looks extremely like the kind of thing you’d usually want to calculate with Feynman diagrams, except I’m not sure whether the $f_{i} (x_{i})$ have the right form to allow us to power expand in $x_{i}$ and then shove the non-quadratic $x_{i}$ terms into source derivatives the way we usually would in perturbative quantum field theory.
3. If all else fails, you can probably do it numerically, lattice-QFT style, using techniques like hybrid Monte Carlo to sample points in the integral efficiently.^[1]
1. ^
  You can maybe also train a neural network to do the sampling.

Lucius Bushnaq 30 Sep 2025 20:01 UTC
6 points
2
in reply to: PeterMcCluskey’s comment on: Why Corrigibility is Hard and Important (i.e. “Whence the high MIRI confidence in alignment difficulty?”)
Skimming some of the posts in the sequence, I am not persuaded that corrigibility now looks like an engineering problem rather than a problem that needs (a) major theoretical breakthrough(s).
The point about corrigibility MIRI keeps making is that it’s anti-natural, and Max seems to agree with that.

Lucius Bushnaq 24 Sep 2025 16:46 UTC
5 points
0
in reply to: Alexander Gietelink Oldenziel’s comment on: Alexander Gietelink Oldenziel’s Shortform
That may be true^[1]. But it doesn’t seem like a particularly useful answer?
“The optimization target is the optimization target.”
1. ^
  For the outer optimiser that builds the AI

Lucius Bushnaq 24 Sep 2025 7:30 UTC
3 points
0
in reply to: Vinayak Pathak’s comment on: Neural networks generalize because of this one weird trick
Are there any theorems that use SLT to quantify out-of-distribution generalization?
There is one now, though whether you still want to count this as part of SLT or not is a matter of definition.

Lucius Bushnaq 21 Sep 2025 9:14 UTC
26 points
10
on: Safety researchers should take a public stance
I’ve said this many times in conversations, but I don’t think I’ve ever written it out explicitly in public, so:
I support some form of global ban or pause on AGI/ASI development. I think the current AI R&D regime is completely insane, and if it continues as it is, we will probably create an unaligned superintelligence that kills everyone.
What links here?
- Reasons to sign a statement to ban superintelligence (+ FAQ for those on the fence) by Mateusz Bagiński (13 Oct 2025 19:00 UTC; 83 points)

Lucius Bushnaq 20 Sep 2025 8:46 UTC
3 points
1
in reply to: Paul W’s comment on: From SLT to AIT: NN generalisation out-of-distribution
Yes, subtracting $\sum_{n} H (P_{μ} (\cdot | x_{n}))$ from inequality (1.1) does yield $\sum_{n = 1}^{N} D_{KL} (P_{μ} (\cdot | x_{n}), P_{M_{1}} (\cdot ∣ x_{n}, D_{< n})) \leq C (μ, M_{1})$ . So, since the total KL divergence summed over the first $N$ data points is bounded by the same constant for any $N$ , and KL-divergences are never negative, $D_{KL} (P_{μ} (\cdot | x_{n}), P_{M_{1}} (\cdot ∣ x_{n}, D_{< n}))$ must go to zero for large $n$ fast enough for the sum to not diverge to infinity, which implies it has to go to zero faster than 1/n.
Though note that in real life, where $N$ is finite, $D_{KL} (P_{μ} (\cdot | x_{n}), P_{M_{1}} (\cdot ∣ x_{n}, D_{< n}))$ can still go to zero very unevenly; it doesn’t have to be monotonic.

For example, you might have $D_{KL} (P_{μ} (\cdot | x_{n}), P_{M_{1}} (\cdot ∣ x_{n}, D_{< n})) = 0$ from $n = 10^{3}$ to $n = 10^{6}$ , then suddenly see a small upward spike at $n = 10^{6} + 1$ . A way this might happen is if the first $10^{6}$ data points the inductor receives come from one data distribution, and the subsequent data points are drawn from a very different distribution. If there is a program $μ^{'}$ that is shorter than $μ$ (so $C (μ^{'}, M_{1}) < C (μ, M_{1})$ ) and that can predict the data labels for the first distribution but not the second distribution, whereas $μ$ can predict both distributions, the inductor would favour $μ^{'}$ over $μ$ and assign it higher probability until it starts seeing data from the second distribution. It might make up to $C (μ^{'}, M_{1})$ bits of prediction error early on before its posterior becomes largely dominated by predictions that match $μ^{'}$ at $n = 10^{3}$ . After that, the KL-divergence would go to zero for a while because everything is getting predicted accurately. Then, at $n = 10^{6} + 1$ , when we switch to the second data distribution, the KL-divergence would go up again for while, until the inductor has added another $\leq C (μ, M_{1}) - C (μ^{'}, M_{1})$ bits of prediction error to the total KL-divergence. From then on the inductor would make predictions that match $μ$ and so the KL-divergence would go back down to zero again and this time stay zero permanently.

Lucius Bushnaq 18 Sep 2025 6:44 UTC
31 points
24
on: How To Dress To Improve Your Epistemics
I think a potential drawback of this strategy is that people tend to become more hesitant to argue with you. Their instincts tell them you’re a high-status person they can’t afford to offend or risk looking stupid in front of. If you seem less confident, less cool, and less high-status, the mental barrier for others to be disagreeable, share weird ideas, or voice confusion in your presence is lower.

I try to remember to show off some uncoolness and uncertainty for this reason, especially around more junior people. I used to have a big seal plushie on my desk in the office, partially because I just like cute stuffed animals, but also to try to signal that I am approachable and non-threatening and can be safely disagreed with.

Lucius Bushnaq 9 Sep 2025 0:21 UTC
3 points
2
in reply to: Sohaib Imran’s comment on: Mikhail Samin’s Shortform
I don’t think quantum immortality changes anything. You can rephrame this in terms of standard probability theory and condition on them continuing to have subjective experience, and still get to the same calculus.
I agree that quantum mechanics is not really central for this on a philosophical level. You get a pretty similar dynamic just from having a universe that is large enough to contain many almost-identical copies of you. It’s just that it seems at present very unclear and arguable whether the physical universe is in fact anywhere near that large, whereas I would claim that a universal wavefunction which constantly decoheres into different branches containing different versions of us is pretty strongly implied to be a thing by the laws of physics as we currently understand them.

However, only considering the branches in which you survive, or conditioning on having subjective experience after the suicide attempt, ignores the counterfactual suffering prevented in all the branches (or probability mass) in which you did die, which may be less unpleasant than the branches in which you survived, but are many many more in number! Ignoring those branches biases the reasoning toward rare survival tails that don’t dominate the actual expected utility.
It is very late here and I should really sleep instead of discussing this, so I won’t be able to reply as in-depth as this probably merits. But, basically, I would claim that this is not the right way to do expected utility calculations when it comes to ensembles of identical or almost-identical minds.

A series of thought experiments might maybe help illustrate part of where my position comes from:
1. Imagine someone tells you that they will put you to sleep and then make two copies of you, identical down to the molecular level. They will place you in a room with blue walls. They will place one copy of you in a room with red walls, and the other copy in another room with blue walls. Then they will wake all three of you up.
  
  What color do you anticipate seeing after you wake up, and with what probability?
  
  I’d say ²⁄₃ blue, ¹⁄₃ red. Because there will now be three versions of me, and until I look at the walls I won’t know which one I am.
2. Imagine someone tells you that they will put you to sleep and then make two copies of you. One copy will not include a brain. It’s just a dead body with an empty skull. Another copy will be identical to you down to the molecular level. Then they will place you in a room with blue walls, and the living copy in a room with red walls. Then they will wake you and the living copy up.
  
  What color do you anticipate seeing after you wake up, and with what probability? Is there a ¹⁄₃ probability that you ‘die’ and don’t experience waking up because you might end up ‘being’ the corpse-copy?
  
  I’d say ¹⁄₂ blue, ¹⁄₂ red, and there is clearly no probability of me ‘dying’ and not experiencing waking up. It’s just a bunch of biomass that happens to be shaped like me.
3. As 2, but instead of creating the corpse-copy without a brain, it is created fully intact, then its brain is destroyed while it is still unconscious. Should that change our anticipated experience? Do we now have a ¹⁄₃ chance of dying in the sense that we might not experience waking up? Is there some other relevant sense in which we die, even if it does not affect our anticipated experience?
  
  I’d say no and no. This scenario is identical to 2 in terms of the relevant information processing that is actually occurring. The corpse-copy will have a brain, but it will never get to use it, so it won’t affect my expected anticipated experience in any way. Adding more dead copies doesn’t change my anticipated experience either. My best scoring prediction will be that I have ¹⁄₂ chance of waking up to see red walls, and ¹⁄₂ chance of waking up to see blue walls.
In real life, if you die in the vast majority of branches caused by some event, i.e. that’s where the majority of the amplitude is, but you survive in some, the calculation for your anticipated experience would seem to not include the branches where you die for the same reason it doesn’t include the dead copies in thought experiments 2 and 3.

(I think Eliezer may have written about this somewhere as well using pretty similar arguments, maybe in the quantum physics sequence, but I can’t find it right now.)

Lucius Bushnaq 8 Sep 2025 23:25 UTC
2 points
0
in reply to: Ben Pace’s comment on: Mikhail Samin’s Shortform
I don’t think it proves too much. Informed decision-making comes in degrees, and some domains are just harder? Like, I think my threshold for leaving people free to make their own mistakes if they are the only ones harmed by them is very low, compared to where the human population average seems to be at the moment. But my threshold is, in fact, greater than zero.
For example, there are a bunch of things I think bystanders should generally prevent four year old human children from doing, even if the children insist that they want to do them. I know that stopping four year old children from doing these things will be detrimental in some cases, and that having such policies is degrading to the childrens’ agency. I remember what it was like being four years old and feeling miserable because of kindergarten teachers who controlled my day and thought they knew what was best for me. I still think the tradeoff is worth it on net in some cases.
I just think that the suicide thing happens to be a case where doing informed decision-making is maybe just too tough for way too many humans and thus some form of ban could plausibly be worth it on net. Sports betting is another case where I was eventually convinced that maybe a legal ban of some form could be worth it.

Lucius Bushnaq 8 Sep 2025 22:24 UTC
2 points
−9
in reply to: Ben Pace’s comment on: Mikhail Samin’s Shortform
I think very very many people are not making an informed decision when they decide to commit suicide.

For example, I think quantum immortality is quite plausibly a thing. Very few people know about quantum immortality and even fewer have seriously thought about it. This means that almost everyone on the planet might have a very mistaken model of what suicide actually does to their anticipated experience.^[1] Also, many people are religious and believe in a pleasant afterlife. Many people considering suicide are mentally ill in a way that compromises their decision making. Many people think transhumanism is impossible and won’t arrange for their brain to be frozen for that reason.

I agree that there is some threshold on the fraction of ill-considered suicides relative to total suicides such that suicide should be legal if we were below that threshold. I used to think we were maybe below that threshold. After I began studying physics at uni and so started taking quantum immortality more seriously, I switched to thinking we are maybe above the threshold.
1. ^
  You might find yourself in a branch where your suicide attempt failed, but a lot of your body and mind were still destroyed. If you keep exponentially decreasing the amplitude of your anticipated future experience in the universal wave function further, you might eventually find that it is now dominated by contributions from weird places and branches far-off in spacetime or configuration space that were formerly negligible, like aliens simulating you for some negotiation or other purpose.
  
  I don’t really know yet how to reason well about what exactly the most likely observed outcome would be here. I do expect that by default, without understanding and careful engineering our civilisation doesn’t remotely have the capability for yet, it’d tend to be very Not Good.

Lucius Bushnaq 7 Sep 2025 18:36 UTC
6 points
0
in reply to: testingthewaters’s comment on: From SLT to AIT: NN generalisation out-of-distribution
Assuming that the bits to parameters encoding can be relaxed, there’s some literature about redundant computations in neural networks. If the feature vectors in a weight matrix aren’t linearly independent, for example, the same computation can be “spread” over many linearly dependent features, with the result that there are no free parameters but the total amount of computational work is the same.
There’s a few other cases like this where we know how various specific forms of simplicity in the computation map onto freedom in the parameters. But those are not enough in this case. We need more freedom than that.