CarolusRenniusVitellius

Karma: 152

My name is Charles Renshaw-Whitman. I am a physicist (‘symbol gremlin’) by training, currently a MATS scholar studying the connection between the structure of natural data and the structure of learned computations.

Currently training away an aversion to sharing my writing/thoughts publicly—please modulate tone of comments accordingly :)

CarolusRenniusVitellius 10 Jun 2026 4:49 UTC
1 point
0
in reply to: simulus’s comment on: Some Interesting Papers on RLVR
I’ll take a look at the ProRL paper later today, thanks for the second.

I agree that RL inefficiency is one problem but I think this can be reasonably factored out in experiments if not in production. The “RL Razor” paper does an experiment where they do SFT on a KL budget and show they get the same ‘reduced forgetting’ effect—I think of this reduced forgetting, in light of the ‘off the principals’ paper, as being something like inertia of learning new representations, or inability to pass through regions of high curvature. That is, there are definitely still qualitative differences at the per-batch level; perhaps this is just an efficiency thing, but it’s plausible to me that it might be more like GD vs SGD in finding different types of solutions with different generalization properties because of the curvature bias. On priors this would surprise me for RL, but I guess I like these papers because they updated me away from that a bit.

As for the milestone self-play results, you’re right that they’ve no place in this story—my semi-cope pro-tem guess is that LLMs operate in a ‘different regime’. Two intuitions for this:
1. for the board games especially, there is no ‘curriculum of representations’ - perhaps learning strategic action is easier for RL than learning to have good ontic chunkings of the world. E.g. for Go, there is no need to learn complex hierarchical representations as table-stakes. The harder Atari games (eg Moctezuma) are counter-evidence to this, except perhaps insofar as their hardness was due to them needing more complex representations? This is an even poorer explanation for something like AlphaStar. But nonetheless, we ended up switching to the pre-training paradigm instead of riding A3C to ASI.
2. The relatively poor performance of things like process supervision for LLM training is still surprising to me and I can’t account for it. If value-learning methods ‘don’t work’ in the LLM regime, is this because there is a structural difference in the data? On priors this just has to be a skill issue, but presumably had someone really solved this and gotten 3-5 OOMs of RLVR efficiency, wouldn’t we be done already? And presumably enough money-effort has been expended that were dramatic success achievable, it’d’ve been done? (To be clear here, by ‘work’ I mean “work so well that per-token gradients aren’t more than a one or two OOMs worse than for pre-training”)

Some Interesting Papers on RLVR

CarolusRenniusVitellius9 Jun 2026 19:00 UTC

23 points

5 comments4 min readLW link

CarolusRenniusVitellius 6 Jun 2026 15:01 UTC
2 points
0
in reply to: cdt’s comment on: Coalitional Darwinism and the Instrumental Utility of Individuality
Indeed they’re the same—the ‘drift-diffusion’ is the name of the same thing in physics and ML. I’ll add a note, thanks.

Coalitional Darwinism and the Instrumental Utility of Individuality

CarolusRenniusVitellius6 Jun 2026 12:53 UTC

25 points

4 comments17 min readLW link

(charlesr-w.github.io)

CarolusRenniusVitellius 29 May 2026 20:32 UTC
4 points
1
on: When Are Two Networks the Same? Tensor Similarity for Mechanistic Interpretability
Cool work! I am curious if you have in mind any ‘natural kinds’ which you might be able to pull out of the tensor structure? I agree that TNs seem pretty promising for interp due to their tractability, and seem fairly clearly to be in the same ‘universality class broadly construed’ as transformers. My concern in applying this would be that, iiuc, it doesn’t natively incorporate a measure on output space, so for LLMs you’d get a disproportionate fraction of dissimilarity coming from behavioural differences on random strings (I think this is a fundamentally hard problem and you seem clearly aware of it, so no shade thrown). I have been thinking about local kernels like the eNTK (= the cosine similarity of output-gradients as a function of two samples) lately for this reason, but would be super excited if you could use TNs to trade the weight-space-local-but-linear eNTK for a global multilinear TN sorta thing? And then perhaps tensor-decomps would yield some objects with a comparable claim to semantic meaning as the eigenvectors of the eNTK?

CarolusRenniusVitellius 29 May 2026 4:50 UTC
1 point
0
on: CharlesRW’s Shortform
A quick argument for why ‘consciousness’ (a central executive function) might be a convergent property of intelligent behaviour (i.e., why Claude might be/come conscious):

5-word summary: Bottom-up hierarchy requires orchestration

Evolution operates at a finite speed and resolution --> most of the structure in complex organisms are not encoded by the genome directly—instead, evolution often follows a motif:
1. Self-organizing progenitor units : typically the result of a lower level of this process.
2. Interaction between progenitor units (usually with some dichotomy like ‘excite/inhibit’, ‘activate/suppress’)
3. Regulatory signals mediating interaction betwen the progenitor units
4. Differentiation of progenitor units according to interactions + regulatory signals
Example: Neurons --> Neural circuits
1. Progenitors: Neurons
2. Competitive/Inhibitory Interactions: Neural signaling + Hebbian learning
3. Regulatory signals: Nerve growth factors (presumably there are ~infinity other regulators)
4. Differentiation into neural circuits via winner-take-all dynamics.
At the level of the unit, neurons and their synapses are built in a way that they will create differentiated ‘circuits’ by default via Hebbian learning and random synaptic connections. This is ‘self-organization’. Excitation/Inhibition dichotomy is important because it allows organization to come from ‘balancing’, which is generically more robust: 1 + (-1) = 0 is different from 100 + (-100) = 0 when your noise is O(1). This also makes gating effective, as a small gating signal can now direct a large response, as for transistors.

The genome converts this into useful functionality by tuning this self-organization process. It controls inputs, outputs, and the local ‘rules of the game’ - using proteins connected to signalling networks accessible to the genome. Input optic nerve, output to V2, mix in some (heinously complex pattern of growth signals) --> get V1 visual cortex.

This is ‘facilitated variation’ - imagine evolution had encoded the connectome in its entirety—then every mutation would mess things up, just as randomly flipping bits in your laptop’s RAM would. Bad, but also unlearnable—evolution needs useful mutations to learn. Instead, mutations affect the signalling program, and this generates more ‘meaningful’ variation. Morally, hierarchical organization lets the genome create N levels of structure in O(log N) complexity.

The major difference from computers: every layer in this process has to equilibrate simultaneously with its constituents and its peers—when a circuit updates in response to other circuits, its constituent neurons must instantiate this macro-level update and must re-equilibrate with one another (e.g., maintain a balance of inhibitory and excitatory signalling). This means that tuning a node at level L could in principle cause O(e^L) updating ‘bottom’ - constraining that exponential is presumably an engineering desideratum. This is a model which makes gated development a la Piaget somewhat natural—you do your best at level L, then lock it in before moving to L+1.

If this is ‘bottom-up’, computers are ‘top-down’ - you can design N layers of abstraction in O(log N) effort by tiling ever-larger circuits; the difference is that this structure is specified at design time by engineering away lower-level variation, rather than taming it. There is no adaptation—bit flips are bad with probability ~1. The cost of designing M variants of a circuit is O(M), while evolution does this by default.

Application to cognition: posit the existence of ‘subminds’, the result of lower levels of this process. ‘The Mind Illuminated’ and ‘Internal Family Systems’, i.a., seem to have models of the mind based on this idea. My guess is that these subminds are not well thought of as spatially localized regions of the brain—I have more in mind something like ‘processes competing for thread time allocated by the OS’

I’d sketch as: Progenitor: sub-minds Interactions: ??? Regulatory signal: Conscious/executive attention (or something like that) Differentiation: ??? (the ‘mind’ in this case is the ‘top of the hierarchy’, maybe?)

My point is much less the (absent) particulars, and more “hierarchical organization by encoding regulated self-organization seems to be the default way to generate useful complexity at scale. If you think LLMs trained by SGD learn some kind of hierarchical structure or representations, it’s a small leap to imagine gating mechanisms that look like metacognition.”

Note that this obviously depends on your model of how/what LLMs learn. And note this doesn’t imply valence, moral patienthood—“whereof we cannot speak...” etc. See also the Anthropic paper on ‘emergent introspection’ - I don’t yet have an opinion on if this is evidence for the model I discuss above, but it has informed my thinking.

(This note is based on conversations with Lorxus and Richard Ngo. It sprung from my research in MATS 9.1. To all of whom my thanks.)

CarolusRenniusVitellius 30 Apr 2026 21:01 UTC
2 points
0
on: Maybe I was too harsh on deep learning theory (three days ago)
Hey commendations on sharing your update.

Another similar line of work I like is Roberts+Yaida’s “Principles of Deep Learning Theory”—this is a similar-in-spirit approach to MFT, but they perturb around a different limit and get feature-learning as a finite-width effect. I haven’t studied MFT to compare the validity of the two; my guess is MFT is the more relevant description. PDLT at least does a very good job modernizing the NTK approach and connecting to the older literature. I’m a fanboy as it was my gateway drug for learning theory lol.

CarolusRenniusVitellius 25 Apr 2026 11:50 UTC
4 points
0
on: Quick Paper Review: “There Will Be a Scientific Theory of Deep Learning”
I find the framing in your review is somewhat odd - I think the state of ‘deep learning theory’ is fairly impressive and that its sterility vis-a-vis frontier LLMs is a hint that we are looking in the wrong place. Early-2026 DLT is a major piece of evidence that we need more data-centric theory, precisely because sophisticated theory has had so much trouble connecting to the frontier. If we had worse theory, we would be more uncertain if the relevant complexities were to be located in the data or in the learning process.

Two analogies I have in mind that guide my thinking here:
- Understanding the physics of animal neurons is necessary to understanding neuroscience. The complexity of brains exists at a ‘higher level’ than individual neurons, but understanding neurons carves out how much of that complexity can be subcellular vs. being in their larger-scale organization. In the same way, something like LLM behaviour is a product of training process, data, etc., and a good theory of learning lets us ask what part of that complexity belongs where.
- Statistical physics as a formalism provides a family of techniques for analyzing physical systems with many degrees-of-freedom. Its great intellectual triumph is the discovery that some things depend on the details how how those degrees-of-freedom interact, while others do not: behaving-like-a-gas is a highly generic property, but material properties like fatigue and fracture-strength can depend quite sensitively on the specifics of the sample in question. The key thing is that we can try to figure out which properties are ‘universal’, and which are not. In DLT, I think it’s much more like a spectrum, as there are many more knobs to tune—data, gross and fine architecture, hyperparameters etc.
Of note to me is that most of the successes of DLT have been at the level of structural depth you’d expect from studying neurons as relate to brain function. E.g., the average neuron firing must activate on average exactly one neuron (lest one die or have a grand mal; comparable to the ‘edge of chaos’ in DLT). These are pretty coarse results more to do with signal processing than with structured computation. It is still illuminating to see that these coarse results hold because they validate our mental models (‘however this neural network is computing stuff, it still has to navigate some kind of signal/noise tradeoff in its activations’).

Insofar as the intelligence of LLMs is the ability to generalize, the no-free-lunch theorems tell us that this generalization has to reflect common structure of the pre-training data and the fine-tuning task (duh!). But our theory of data isn’t yet advanced enough to talk more than proleptically about that structure—e.g. a claim that fine-tuning is ‘conditioning nodes in the common sparse hierarchical latent world model’ is descriptive and substantive, but not enough that it is easily falsifiable. “Write Ruby code” and “Write Python code” are obviously more similar to each other than “design a jet turbine”, but given only a black-box loss-function for each of those 3 tasks, it’s not that clear how we could principledly determine that similarity a priori from the ‘geometry’ of the functions alone.

CarolusRenniusVitellius 27 Feb 2026 15:21 UTC
1 point
0
in reply to: David Africa’s comment on: What I Got From 1.5 Years In Slightly-Competitive Debate
Yeah indeed I think my engagement profile was different from the more competitive attitudes you’d expect high-powered teams/unis to have—our debating club was a small one at a school self-consciously focused on engineering, so it was much more hobbyist. I was always in it for self-growth rather than winning, so it served as a good check on the kind of intellectual hubris that accrued as I was levelling-up in physics. We were fortunate to have a handful of emerita members that could pass on a lot of their knowledge without expecting a big commitment to competition performance. But I’m not surprised (and sorry to hear!) that there are suckier steady-states.

What I Got From 1.5 Years In Slightly-Competitive Debate

CarolusRenniusVitellius27 Feb 2026 5:37 UTC

23 points

6 comments8 min readLW link

(charlesr-w.github.io)

CarolusRenniusVitellius 20 Feb 2026 0:21 UTC
1 point
0
in reply to: Ustice’s comment on: Power Laws Are Not Enough
thanks; “ime” = ‘in my experience’, if that is what you’re referring to? If not, sorry I don’t see it.

Power Laws Are Not Enough

CarolusRenniusVitellius19 Feb 2026 4:31 UTC

10 points

3 comments4 min readLW link

(charlesr-w.github.io)

CarolusRenniusVitellius 17 Feb 2026 20:02 UTC
1 point
0
in reply to: Richard_Ngo’s comment on: ricraz’s Shortform
As a comment on identity in science: certainly it’s not about ‘personal/agentic’ identity, but I have been thinking a lot about how we draw boundaries between objects—like “when” a group of 3 quarks “is” a proton. Generally this involves specifying a “scale” parameter expressing ~ how much information we’re willing to lose in abstracting away from details—then we take, symbolically, the limit where this parameter goes to infinity. Then you can use perturbation theory to bridge lower-level effects to cash out at the higher-level (eg how quark structure effects proton collision statistics)

For systems of agents, the analogue is, from the perspective of the group-level, ‘coherence’, or ‘incentive-compatibility’ from the sub-agents’ perspective. Unfortunately we don’t really have the tools to do the analogue of perturbation theory in these more complicated cases. It seems like the salient difference is that we can’t really scalarize ‘incoherence’, as there are too many saliently distinct ways for a group to be ‘incoherent’ and no natural commensurability between them.

CarolusRenniusVitellius 14 Feb 2026 16:44 UTC
2 points
0
in reply to: Ashe Vazquez Nuñez’s comment on: Every Measurement Has a Scale
Yup! A math-ier version of the insight I was trying to convey is to say that, if you imagine dealing with math where you have a handful of these limit-y properties floating around, it may well matter what order you take those limits in! And at the same time that the limit-y-ness of these properties is usually quite “hidden”, so that this becomes a very useful mental motion.

CarolusRenniusVitellius 10 Feb 2026 0:44 UTC
3 points
0
in reply to: StanislavKrym’s comment on: Claude Opus 4.6: System Card Part 1: Mundane Alignment and Model Welfare
Ahah I have Claude’s system prompt set to default to Chinese so I can practice. Since my speaking/writing abilities suck much worse than my reading, I also told it to nag me to write in Chinese for the sake of practice lol. This works… variably well.

CarolusRenniusVitellius 9 Feb 2026 19:45 UTC
1 point
0
in reply to: CstineSublime’s comment on: CharlesRW’s Shortform
Yup I definitely agree there’s no special role for unicellular attackers—I was eliding the complexity for brevity. I think the asymmetry still broadly holds meaningfully—e.g. multicellular parasites are very complex attackers but have much longer generation-times (I assume?), so they too trade off online vs offline optimization bits. Nonetheless the host organism still has more complexity to draw on for most things with which the immune system is concerned.

Interesting to think about the pareto frontier of offline vs online optimization. The multicellular parasites and unicellular microbes would be paradigm examples. But the microbiome gives lie to this idea—it is complex and organized but highly adaptive still because selection can act on the lower level. Perhaps being ~commensal/mutual instead of adversarial is related? I don’t know.

CarolusRenniusVitellius 9 Feb 2026 4:52 UTC
1 point
0
in reply to: CarolusRenniusVitellius’s comment on: CharlesRW’s Shortform
The linked Claude conversation doesn’t share the markdown file unfortunately. Apologies. Here is a gdrive link https://drive.google.com/file/d/1wpPGI7poP04ZMDoPU_lEh8D1CI8kOtic/view?usp=sharing I read it and it was a good introduction but it did a mediocre job of reframing things ‘as an optimizer’ or even ‘as a control system’.

CarolusRenniusVitellius 9 Feb 2026 2:37 UTC
1 point
0
in reply to: CarolusRenniusVitellius’s comment on: CharlesRW’s Shortform
Note: Skimming, Claude hallucinates what Alon’s ‘periodic table of diseases’ is. He has a pretty good youtube video on it you can watch instead. https://www.youtube.com/watch?v=ZMz_C778WMY&pp=ygUeYWxvbiBwZXJpb2RpYyB0YWJsZSBvZiBkaXNlYXNl

CarolusRenniusVitellius 8 Feb 2026 21:07 UTC
6 points
−1
on: CharlesRW’s Shortform
The Immune System as Anti-Optimizer

We have a short list of systems we like to call “optimizers” — the market, natural selection, human design, superintelligence. I think we ought to hold the immune system in comparable regard; I’m essentially ignorant of immunobiology beyond a few YouTube videos (perhaps a really fantastic LW sequence exists of which I am unaware), but here’s why I am thinking this.

The immune system is the archetypal anti-optimizer: it defends a big multicellular organism from rapidly evolving microbiota. The key asymmetry:
1. Specified once by the genome. It cannot rewrite its own source code between generations. Its adversaries evolve orders of magnitude faster.
2. Resource asymmetry as counterbalance. Whole organs are devoted to the adaptive immune response. A microbe is one cell; the immune system is a civilization.
3. This extends to cancer. The immune system typically out-adapts malignant cells despite selection acting far more rapidly on them. Immunosurveillance fails not because it is weak but because cancers occasionally evolve to exploit its specific tolerance mechanisms.
In short: the immune system embodies enough amortized optimization power to defend against online adversarial attacks by natural selection, because these attacks are constrained by the comparative simplicity of the attackers. One optimizer constrains another, faster and more adaptive optimizer, by having more resources.

What makes this especially interesting is that the immune system has no discernible volition. It is complex — probably far more so than I appreciate — but intuitively much more like a thermostat than a scheming eldritch god. It optimizes powerfully, within bounds that feel legible and non-agential.

I will not be so crass as to say “big if true for alignment”, but you are permitted to infer this if it please you. I just think it’s neat. Consider the mere phrase “semiotic immune system” (from, if I recall correctly, Charles Stross’s Accelerando) — suggests a lot at once, eh?

I asked Claude to prepare the following tutorial—which I have not yet read (longa est vita, si uti bene scias...) - developing this theme: https://claude.ai/share/67bb8de3-b73c-4a21-916b-70affba0da43

*Written with slight corrections for conciseness from Opus4.6. Ironically, the em-dashes are mine.
What links here?
- Coalitional Darwinism and the Instrumental Utility of Individuality by CarolusRenniusVitellius (6 Jun 2026 12:53 UTC; 25 points)
- StanislavKrym's comment on Claude Opus 4.6: System Card Part 1: Mundane Alignment and Model Welfare by Zvi (9 Feb 2026 22:09 UTC; 1 point)

Every Measurement Has a Scale

CarolusRenniusVitellius8 Feb 2026 20:07 UTC

17 points

4 comments4 min readLW link

(charlesr-w.github.io)

CarolusRenniusVitellius

Some In­ter­est­ing Papers on RLVR

Coal­i­tional Dar­winism and the In­stru­men­tal Utility of Individuality

What I Got From 1.5 Years In Slightly-Com­pet­i­tive Debate

Power Laws Are Not Enough

Every Mea­sure­ment Has a Scale

Some Interesting Papers on RLVR

Coalitional Darwinism and the Instrumental Utility of Individuality

What I Got From 1.5 Years In Slightly-Competitive Debate

Every Measurement Has a Scale