Adam Scherlis

Karma: 1,176

Adam Scherlis 15 Apr 2026 0:20 UTC
8 points
6
in reply to: Orioth’s comment on: Morale
I don’t think the OP was using “belief” to mean Bayesian credence; more like “what your System 1 expects”. Also, you can definitely make it the case that your conditions improving is correlated with effort, and the OP gives several examples of how to do this. Changing the territory is a pretty good way to change your (approximate cached proxies for) beliefs in a predictable direction.

Adam Scherlis 6 Jan 2026 7:57 UTC
LW: 6 AF: 3
0
AF
on: SAE feature geometry is outside the superposition hypothesis
This post makes the excellent point that the paradigm that motivated SAEs—the superposition hypothesis—is incompatible with widely-known and easily demonstrated properties of SAE features (and feature vectors in general). The superposition hypothesis assumes that feature vectors have nonzero cosine similarity only because there isn’t enough space for them all to be orthogonal, in which case the cosine similarities themselves shouldn’t be meaningful. But in fact, cosine similarities between feature vectors have rich semantic content, as shown by circular embeddings (in several contexts) and feature splitting / dimensionality-reduction visualizations. Features aren’t just crammed together arbitrarily; they’re grouped with similar features.

I didn’t properly appreciate this point before reading this post (actually: before someone summarized the post to me verbally), at which point it became blindingly obvious.

There are some earlier blog posts that point out that superposition is probably only part of the story, e.g. https://transformer-circuits.pub/2023/superposition-composition/index.html on compositionality, but this one presents the relevant empirical evidence and its implications very clearly.

This post holds up pretty well: SAEs are still popular (although they’ve lost some followers in the last ~year), and the point isn’t specific to SAEs anyway (circular features embeddings are ubiquitous). Superposition is also still an important idea, although I’ve been thinking about it less so I’m not sure what the state of the art is.

My only complaint is that “maybe if I’m being more sophisticated, I can specify the correlations between features” is giving the entire game away—the full set of correlations is nearly equivalent to the embeddings themselves, and has all of the interesting parts.

But I think the rest of the post demonstrates an important tension between theory and experiment, which an improved theory has to be able to account for, and I don’t think I’ve heard of an improved theory yet.

Adam Scherlis 7 Nov 2025 22:25 UTC
5 points
0
in reply to: AprilSR’s comment on: The Tale of the Top-Tier Intellect
I like this observation!

To make the pair of strategies slightly more concrete, we could say that Bot A always picks LeelaPieceOdds’s top choice, out of moves that don’t hurt the game-theoretic status, and Bot B always picks its bottom choice out of those moves.

It would be fun to watch Bot B play a novice human, followed by a grandmaster. In both cases it might look a lot like a bored genius playing a newbie—hanging some pieces to even the odds, then playing competently with what’s left.

Adam Scherlis 7 Nov 2025 0:27 UTC
2 points
0
in reply to: Sergii’s comment on: Review: K-Pop Demon Hunters (2025)
I think the “all demons are fallen humans” interpretation is supported by one of Jinu’s lines: “That’s all demons do. Feel. Feel our shame, our misery. It’s how Gwi-Ma controls us”

Adam Scherlis 5 Nov 2025 20:58 UTC
2 points
0
in reply to: Measure’s comment on: Maxwell’s Demon and the Arrow of Time
Good point! I should have said something like “turning the wrong way with enough energy that, if its momentum were suddenly reversed, it would be expected to reach the next notch.”

Adam Scherlis 4 Nov 2025 8:20 UTC
5 points
0
in reply to: Garrett Baker’s comment on: FTL travel and scientific realism
My impression is that the averaged null energy condition (violated by negative mass) is much more widely accepted than the strong energy condition (violated by dark energy), but yeah, that’s a good point. (Also, I don’t really know why people think the ANEC or something similar has to be true.)

Adam Scherlis 9 Sep 2025 20:07 UTC
1 point
0
in reply to: Gunnar_Zarncke’s comment on: What can we learn about childrearing from J. S. Mill?
1. This is an April Fool’s post.
2. J S Mill’s father was much less influential than either Mill or Bentham and the excerpt in the post does not describe him as having any particular intellectual abilities. (In fact he was a fairly successful writer in his day, but not remotely comparable to his son.)
3. Utilitarianism is not passed down genetically.

Adam Scherlis 7 Apr 2025 17:26 UTC
6 points
0
in reply to: James Camacho’s comment on: How much progress actually happens in theoretical physics?
I think you unfortunately can’t really verify the recent epistemic health of theoretical physics, without knowing much theoretical physics, by tracing theorems back to axioms. This is impossible to do even in math (can I, as a relative layperson, formalize and check the recent Langlands Program breakthrough in LEAN?) and physics is even less based on axioms than math is.
(“Even less” bc even math is not really based on mutually-agreed-upon axioms in a naive sense, cf. Proofs and Refutations or the endless squabbling over foundations.)
Possibly you can’t externally verify the epistemic health of theoretical physics at all, post-70s, given the “out of low hanging empirical fruit” issue and the level of prerequisites needed to remotely begin to learn anything beyond QFT.
Speaking as a (former) theoretical physicist: trust us. We know what we’re talking about ;)

Adam Scherlis 7 Apr 2025 17:16 UTC
4 points
1
in reply to: ChristianKl’s comment on: How much progress actually happens in theoretical physics?
I’m not Mitchell, but I think I agree with him here enough to guess: He probably means to say that production of new plausible theories has increased, production of experimentally verified theories has stalled, and the latter is not string theory’s fault.
(And of course this whole discussion, including your question, is interpreting “physics” to means “fundamental physics”, since theoretical and empirical work on e.g. condensed matter physics has been doing just fine.)

Adam Scherlis 7 Apr 2025 17:10 UTC
5 points
2
in reply to: Garrett Baker’s comment on: How much progress actually happens in theoretical physics?
I am not going to spend more than a few minutes here or there to give “speaking as a physicist” takes on random LW posts; I think convincing people that my views are correct in full detail would require teaching them the same things that convinced me of those views, which includes e.g. multiple years of study of QFT.
Instead, I tend to summarize what I think and invite people to ask specific questions about e.g. “why do you believe X” if they want to go further down the tree or test my beliefs more aggressively.
“That doesn’t answer the question because I am not convinced by everything you said” is not really a helpful way to do that imo.

Adam Scherlis 7 Apr 2025 17:05 UTC
2 points
0
in reply to: Garrett Baker’s comment on: How much progress actually happens in theoretical physics?
To spell out my views: there has been a bit of a real slow-down in theoretical physics, because exploring the tree of possible theories without experiment as a pruning mechanism is slower than if you do get to prune. I think the theory slowdown also looks worse to outsiders than it is, because the ongoing progress that does happen is also harder to explain due to increasing mathematical sophistication and a lack of experimental correlates to point to. This makes e.g. string theory very hard to defend to laypeople without saying “sorry, go learn the theory first”.
This is downstream of a more severe slowdown in unexplained empirical results, which results from (imo) pretty general considerations of precision and energy scales, per the modern understanding of renormalization, which suggest that “low-hanging fruit gets picked and it becomes extremely expensive to find new fruit” is a priori pretty much how you should expect experimental physics to work. And indeed this seems to have happened in the mid 20th century, when lots of money got spent on experimental physics and the remaining fruit now hangs very high indeed.
And then there’s the 90s/2000s LHC supersymmetry hype problem, which is a whole nother (related) story.

Adam Scherlis 5 Apr 2025 6:12 UTC
9 points
1
on: How much progress actually happens in theoretical physics?
The main thing I’d add is that string theory is not the problem. All the experimental low hanging fruit was picked decades ago. There are very general considerations that suggest that any theory of quantum gravity, stringy or otherwise, will only really be testable at the Planck scale. What this means in practice is that theoretical high-energy physics doesn’t get to check its answers anymore.
I think there’s still progress, and still hope for new empirical results (especially in astrophysics and cosmology), but it’s much harder without a barrage of unexplained observations.

Adam Scherlis 3 Apr 2025 20:58 UTC
3 points
0
in reply to: Lucius Bushnaq’s comment on: Estimating the Probability of Sampling a Trained Neural Network at Random
Great questions :)

The approach here is much faster than the SGLD approach; it only takes tens or hundreds of forward passes to get a decent estimate. Maybe that’s achievable in principle with SGLD, but we haven’t managed it.
I like KFAC but I don’t think estimating the Hessian spectrum better is a bottleneck; in our experiments on tiny models, the true Hessian didn’t even always outperform the ADAM moment estimates. I like the ideas here, though!

The big downside of our approach, compared to Timaeus’s, is that it underestimates basin size (overestimates complexity) for two reasons:
1) Jensen bias: the “pancake” issue, which we can alleviate a bit with preconditioners
2) The “star domain” constraint we impose (requiring line-of-sight between the anchor point and the rest of the basin) is arguably pretty strict, although we think it holds by default for the “KL neighborhood” variant.
It’s not clear that this is an obstacle in practice, though, in settings where you just want a metric of complexity that runs fast and has approximately the right theoretical and empirical properties to do practical work with.

We’ve been working on using SGLD and thermodynamic integration to get a more-trusted measurement of basin size, but we suspect the most naive version of our estimator (or the Adam-preconditioned version) will be most practical for downstream applications.

We use average KL divergence over a test set as our behavioral loss, and (for small models where it’s tractable) we use the Hessian of KL, i.e. the Fisher.

Adam Scherlis 1 Apr 2025 3:22 UTC
1 point
0
in reply to: Lucius Bushnaq’s comment on: Estimating the Probability of Sampling a Trained Neural Network at Random
I am not sure I agree :)
It is unimportant in the limit (of infinite data), but away from that limit, it is only unimportant by a factor of 1/log(data), which seems small enough to be beatable in practice in some circumstances.
The spectra of things like Hessians tend to be singular, yes, but also sort of power-law. This makes the dimensionality a bit fuzzy and (imo) makes it possible for absolute volume scale of basins to compete with dimensionality.
Essentially: it’s not clear that a 301-dimensional sphere really is “bigger” than a 300-dimensional sphere, if the 300-dimensional sphere has a much larger radius. (Obviously it’s true in a strict sense, but hopefully you know what I’m gesturing at here.)

Adam Scherlis 1 Apr 2025 3:16 UTC
1 point
0
in reply to: Lucius Bushnaq’s comment on: Estimating the Probability of Sampling a Trained Neural Network at Random
I think this is correct but we’re working on paper rebuttals/revisions, I’ll take a closer look very soon! I think we’re working along parallel lines.
In particular, I have been thinking of “measure volumes at varying cutoffs” as being more or less equivalent to “measure LLC at varying ε”.

We choose expected KL divergence as a cost function because it gives a behavioral loss, just like your behavioral LLC, yes.
I can give more precise statements once I look at my notes.

Adam Scherlis 1 Mar 2025 2:28 UTC
6 points
0
on: Estimating the Probability of Sampling a Trained Neural Network at Random
If you’re wondering if this has a connection to Singular Learning Theory: Yup!
In SLT terms, we’ve developed a method for measuring the constant (with respect to n) term in the free energy, whereas LLC measures the log(n) term. Or if you like the thermodynamic analogy, LLC is the heat capacity and log(local volume) is the Gibbs entropy.
We’re now working on better methods for measuring these sorts of quantities, and on interpretability applications of them.

Adam Scherlis 7 Nov 2024 4:19 UTC
7 points
3
in reply to: ZY’s comment on: Should CA, TX, OK, and LA merge into a giant swing state, just for elections?
It stops being in the interests of CATXOKLA to invite more states once they’re already big enough to dominate national electoral politics.

Adam Scherlis 7 Nov 2024 4:17 UTC
10 points
0
in reply to: Eric Neyman’s comment on: Should CA, TX, OK, and LA merge into a giant swing state, just for elections?
The non-CATXOKLA swing states can merge with each other and a few red and blue states to form an even bigger bloc :)

I think there’s a range of stable equilibria here, depending on the sequence of merges, with the largest bloc being a majority of any size. I think they all disenfranchise someone, though.

So you can’t ever get to a national popular vote, without relying on things like the NPVIC which shortsightedly miss the obvious dominating strategy of a 51% attack against American democracy.

Adam Scherlis 26 Jun 2024 1:52 UTC
5 points
0
on: SAE feature geometry is outside the superposition hypothesis
I strongly agree with this post.
I’m not sure about this, though:
We are familiar with modular addition being performed in a circle from Nanda et al., so we were primed to spot this kind of thing — more evidence of street lighting.
It could be the streetlight effect, but it’s not that surprising that we’d see this pattern repeatedly. This circular representation for modular addition is essentially the only nontrivial representation (in the group-theoretic sense) for modular addition, which is the only (simple) commutative group. It’s likely to pop up in many places whether or not we’re looking for it (like position embeddings, as Eric pointed out, or anything else Fourier-flavored).
Also:
As for where in the activation space each feature vector is placed, oh that doesn’t really matter and any nearly orthogonal overcomplete basis will do. Or maybe if I’m being more sophisticated, I can specify the correlations between features and that’s enough to pin down all the structure that matters — all the other details of the overcomplete basis are random.
The correlations between all pairs of features are sufficient to pin down an arbitrary amount of structure—everything except an overall rotation of the embedding space—so someone could object that the circular representation and UMAP results are “just” showing the correlations between features. I would probably say the “superposition hypothesis” is a bit stronger than that, but weaker than “any nearly orthogonal overcomplete basis will do”: it says that the total amount of correlation between a given feature and all other features (i.e. interference from them) matters, but which other features are interfering with it doesn’t matter, and the particular amount of interference from each other feature doesn’t matter either. This version of the hypothesis seems pretty well falsified at this point.

Adam Scherlis 24 Apr 2024 17:57 UTC
14 points
9
on: What’s up with all the non-Mormons? Weirdly specific universalities across LLMs
I suspect a lot of this has to do with the low temperature.

The phrase “person who is not a member of the Church of Jesus Christ of Latter-day Saints” has a sort of rambling filibuster quality to it. Each word is pretty likely, in general, given the previous ones, even though the entire phrase is a bit specific. This is the bias inherent in low-temperature sampling, which tends to write itself into corners and produce long phrases full of obvious-next-words that are not necessarily themselves common phrases.

Going word by word, “person who is not a member...” is all nice and vague and generic; by the time you get to “a member of the”, obvious continuations are “Church” or “Communist Party”; by the time you have “the Church of”, “England” is a pretty likely continuation. Why Mormons though?

“Since 2018, the LDS Church has emphasized a desire for its members be referred to as “members of The Church of Jesus Christ of Latter-day Saints”.”—Wikipedia

And there just aren’t that many other likely continuations of the low-temperature-attracting phrase “members of the Church of”.

(While “member of the Communist Party” is an infamous phrase from McCarthyism.)

If I’m right, sampling at temperature 1 should produce a much more representative set of definitions.