simulus

Karma: 89

simulus 17 Mar 2026 7:35 UTC
1 point
0
on: You can’t imitation-learn how to continual-learn
This knocks on the door of a principle that I have been playing with for a while: a good continual learning and/or sequence modelling algorithm should converge to some known behavior. Architectures like attention have undefined behavior once the end of their training context length is reached. SGD on the other hand can be run indefinitely, because we know that it will eventually converge to an interpolation of the data.

I’ve prototyped and written about an architecture based on this principle [1], and have so far seen positive results in length extrapolation (note that the blog is out-of-date and only contains results from my earliest prototypes).

1: https://aklein.bearblog.dev/ittt/

simulus 28 Feb 2026 5:37 UTC
3 points
0
in reply to: Brendan Long’s comment on: ajskateboarder’s Shortform
Someone I know claims to have found a way to directly pretrain neuralese models: https://aklein.bearblog.dev/zebra/
I’ve seen their prototype, and it definitely works (as far as producing reasonable text outputs while making non-trivial use of >100 continuous latents), but whether it actually amounts to anything remains to be seen.

simulus 26 Feb 2026 19:43 UTC
6 points
0
in reply to: lilkim2025’s comment on: lilkim2025′s Shortform
This appears true at the academic scale, but not at the frontier scale where RL compute consumption is much higher (sometimes even higher that pretraining).
As a counter-example to your evidence, when Nvidia scaled up their RL they found:
Furthermore, Nemotron-Research-Reasoning-Qwen-1.5B offers surprising new insights —RL can indeed discover genuinely new solution pathways entirely absent in base models, when given sufficient training time and applied to novel reasoning tasks. Through comprehensive analysis, we show that our model generates novel insights and performs exceptionally well on tasks with increasingly difficult and out-of-domain tasks, suggesting a genuine expansion of reasoning capabilities beyond its initial training. Most strikingly, we identify many tasks where the base model fails to produce any correct solutions regardless of the amount of sampling, while our RL-trained model achieves 100% pass rates (Figure 4).

simulus 24 Feb 2026 6:44 UTC
18 points
0
on: ghost-in-the-weights’s Shortform
Steerling-8B: The First Inherently Interpretable Language Model
This is probably worth a deeper discussion, but Guide Labs is claiming that their new model is “the first interpretable model that can trace any token it generates to its input context, concepts a human can understand, and its training data”.
Reading the blog, Steerling is basically just a discrete diffusion model where the final hidden states are passed through a sparse autoencoder before the LM head.
They also appear to apply a loss that aligns the SAE’s activations with labelled concepts (correct me if I’m wrong). However, this seems like an obvious example of The Most Forbidden Technique, and could make the model appear interpretable without the attributed concepts actually having causal effect on the model’s decisions.
Can we get some input from interpretability folks? I’m obviously bearish.
Link to release post: https://www.guidelabs.ai/post/steerling-8b-base-model-release/

simulus 9 Feb 2026 5:39 UTC
−8 points
0
in reply to: Hruss’s comment on: Hruss’s Shortform
...

simulus 6 Feb 2026 22:59 UTC
4 points
0
in reply to: DirectedEvolution’s comment on: AllAmericanBreakfast’s Shortform
I wonder if this is an artifact from the training data.
There are probably more edge-case bugs in published code (or even intermediate commits) than there are obvious bugs.

simulus 3 Feb 2026 16:49 UTC
4 points
0
on: ghost-in-the-weights’s Shortform
A recent paper probed LLMs and located both value features (representing the expected reward) and “dopamine” features (representing the reward prediction error). These features are embedded in sparse sets of neurons, and were found to be critical for reasoning performance.
Could these findings have any implications for model welfare?
If a model had mechanisms for “feeling good and bad”, I imagine they would look similar to this.
The paper in question: https://arxiv.org/abs/2602.00986

simulus 31 Jan 2026 1:12 UTC
2 points
0
in reply to: Seth Herd’s comment on: Are We in a Continual Learning Overhang?
Yes, I am referring to the lack of learning-to-learn data during initial training.

Your point that humans have built-in mechanisms for continual learning is similar to what I’m saying about inductive biases: if we don’t have the data to train continual learning into models, we need to build it into the architecture.
However, I think the ‘data’ from which humans learn during development (on-policy interactions with the environment with constant feedback and something like rewards) is much more aligned to continual learning than books and pdfs.

simulus 29 Jan 2026 18:25 UTC
5 points
1
on: Are We in a Continual Learning Overhang?
I believe that the biggest bottleneck for continual learning is data.
First, I am defining continual learning (CL) as extreme long-context modelling with particularly good in-context supervised and reinforcement learning. You seem to have a similar implied definition, given that Titans and Hope are technically sequence modelling architectures more than classical continual learning architectures.
Titans might already be capable of crudely performing CL as I defined it, but we wouldn’t know. The reason is that we haven’t trained it on data that looks like CL. The long-context data that we currently use looks like pdfs, books, and synthetically concatenated snippets. None of that data, if you saw a model producing it, would you consider to be CL. The data doesn’t contain failures, feedback, and an entity learning from them. If we just trained the architecture on (currently non-existent to the public) data that looks like CL, then I think we would have CL.
The obvious solution to this problem is to collect better data. This would be expensive, but the big players could probably afford it.
Another solution that I see is to bake a strong inductive bias into the architecture. If CL is an out-of-distribution behavior relative to the training data, then the best option is an architecture that “wants” to exhibit CL-like behavior. Taken to the extreme, such an architecture would exhibit CL-like behavior without any prior training at all. One example would be an “architecture” that just fine-tunes a sliding-window transformer on the stream of context. Of the current weight-based architectures, I think E2E-TTT is the closest to this vision, since it is essentially meta-learned fine-tuning.
The final solution is to use reinforcement learning instead of pretraining to get CL abilities. If getting high rewards necessitates CL, then we would expect RL to eventually bake in continual learning. The problem is that RL is just so costly and inefficient, and we lack open-ended environments with unhackable rewards.

simulus 29 Jan 2026 5:26 UTC
3 points
0
on: Is the Gell-Mann effect overrated?
Anecdotally, as someone who works on non-AGI-targetting AI research, I find pop-sci articles on AI research to be horribly misrepresentive.
A paper that introduces a new algorithm that guides drones around a simulator by creating sub-tasks might be presented as “AI researchers create a new kind of digital brain—and it has its own goals”. That’s obviously a click-bait headline, but the article itself usually does little to clean things up.
However, I would imagine that AI is currently among the worst fields for this kind of thing due to manufactured hype, culture wars, and the age-old anthropomorphization of AI algorithms.

simulus 14 Jan 2026 20:15 UTC
1 point
2
in reply to: Yair Halberstadt’s comment on: Parameters Are Like Pixels
This was the classical intuition, but turned out to be untrue in the regime of large NNs.
The modern view is double descent (https://en.wikipedia.org/wiki/Double_descent), where small models generalize better until the number of parameters exceeds the number of training examples, then larger models generalize better with the same amount of data.

simulus 11 Jan 2026 4:57 UTC
1 point
0
in reply to: RobinHa’s comment on: The Case Against Continuous Chain-of-Thought (Neuralese)
But why would this error accumulation be a problem in recurrent forward passes and not one long forward pass?

simulus 11 Jan 2026 0:16 UTC
1 point
0
in reply to: RobinHa’s comment on: The Case Against Continuous Chain-of-Thought (Neuralese)
I think the question is whether applying quantization to hidden states in the middle of the forward pass during both training and inference would improve performance, which your argument would seem to imply.

simulus 28 Dec 2025 20:47 UTC
1 point
0
on: Training Matching Pursuit SAEs on LLMs
In this post you seem to imply that the slow training is due to a lack of parallelization, but don’t MP-SAEs also require more total flops?
At each iteration you need to recompute the encoder dot products using a matmul with the encoder matrix (a look at your code confirms this), so I would think that the total flops would scale almost linearly as you increase the number of iterations.

simulus 27 Dec 2025 4:46 UTC
3 points
2
in reply to: testingthewaters’s comment on: testingthewaters’s Shortform
The problem I see with this claim is that in the acedemic realm, good value maximization is what researchers get excited about, even to a fault. It is a lot easier to get a paper published by saying “our method gets a higher reward than previous methods” than “our method does xyz interesting thing”. If researchers could publish a better paperclip maximizer they almost certainly would.
If you instead looked at curiosity algorithms or reward-free (self-supervised) RL, where “success” is a bit more ambiguous, then I would agree that the inductive biases of deep NNs probably play a bigger role than usually acknowledged. In fact, a paper about the role of NN depth on self-supervised RL recently won best paper at NEURIPS: https://wang-kevin3290.github.io/scaling-crl/

simulus 27 Dec 2025 4:30 UTC
15 points
5
on: Measuring no CoT math time horizon (single forward pass)
I’m sure you already know that your method of asking Opus 4.5 for time estimates is the weakest part of your methodology. The thing that really throws me for a loop about it is that Claude models are consistently on the pareto frontier of performance while Anthropic has never seemed the strongest in math.
I would be interested to see how the data changes when you use different models for time estimates, or even average their estimates.
I don’t have a causal reason to expect it, but it would really be fascinating if there is a continued trend of the time estimator family dominating the pareto frontier (ex ChatGPT models are the pareto frontier when ChatGPT is the time estimator).
I would also be curious to see where Kimi K2 stands, since it is among the strongest non-reasoning models and reportedly has a different “feel” than other models (possibly due to the Muon optimizer).

simulus’ Shortform

simulus24 Dec 2025 0:04 UTC

2 points

3 comments1 min readLW link

simulus 24 Dec 2025 0:04 UTC
5 points
0
on: ghost-in-the-weights’s Shortform
The Searchlight Institute recently released a survey of Americans’ views and usage of AI:
https://www.searchlightinstitute.org/research/americans-have-mixed-views-of-ai-and-an-appetite-for-regulation/
There is a lot of information, but the most clear take-away is that the majority of those surveyed support AI regulation.
Another result that surprises (and concerns) me is this side note:
A question that was interesting, but didn’t lead to a larger conclusion, was asking what actually happens when you ask a tool like ChatGPT a question. 45% think it looks up an exact answer in a database, and 21% think it follows a script of prewritten responses.

simulus 19 Dec 2025 21:00 UTC
1 point
0
in reply to: bfinn’s comment on: bfinn’s Shortform
Yes, that’s what I’m implying. We don’t have super concrete evidence for it, but a large contingent of researchers (including myself) believes things like that are happening.
To understand why neural networks might do this, you can view them as a sort of Solomonoff induction. The neural network is a “program” (similar to a python program written in code) that has to model its data as well as possible, but the model is only so big, which means that the program is limited in length. Thinking about how you might write a program to generate images, it would be much more code-efficient to write abstractions and rules for things like geometry than it would be to enumerate every possibility. The training/optimization algorithm figures that out also.
You might also be interested in emergent world representations [1] (models simulate the underlying processes or “world” that generates their data in order to predict it) and the platonic representation hypothesis [2] (different models trained on seperate modalities like text and images form similar representations, meaning that an image model will represent a picture of a dog in a similar way to how a text model will represent the word “dog”).
1: https://arxiv.org/abs/2210.13382
2: https://arxiv.org/abs/2405.07987

simulus 18 Dec 2025 18:20 UTC
6 points
2
in reply to: Gavin Runeblade’s comment on: bfinn’s Shortform
There is no understanding of the relationship or the 3d nature of the space.
I don’t think you can claim that. Research has repeatedly shown that image generators like stable diffusion carry strong representations of depth and geometry, such that performant depth estimation models can be built out of them with minimal retraining. This continues to be true for video generation models.
Early work on the subject: https://arxiv.org/pdf/2409.09144

simulus

simu­lus’ Shortform

simulus’ Shortform