Software engineering, parenting, cognition, meditation, other
Linkedin, Facebook, Admonymous (anonymous feedback)
Gunnar_Zarncke
Victor Taelin on X is impressed:
this is my personal singularity moment
this post may sound like a paid ad. I only wish. I’m concerned, more so than happy. the world is changing, and, among the scenarios where AI goes terribly wrong, inequality is the most realistic, yet, the one Anthropic seems to be the least concerned about. I’m glad OpenAI is taking the opposite stance: *personal AGI for everyone*. I think this is a commendable position in the times we live. but who am I in the queue of the bread?
anyway, Fable is here, so I’ll just report my first-hour experience
first of all, all my pet prompts are solved.
→ λ-calculus puzzles
→ bug questions
→ one-shot apps
all are trivial to it.
I don’t have anything harder other than my
ongoing work
so, in the last several days, I’ve been toying with HVM5, a new interaction net evaluator with a faster loop.
after writing the first version, I left 32 GPT-5 agents working for ~20 hours each. this resulted in up to 2x speedups, but the file size increased by 2-fold and quality decreased significantly.
I then simplified the whole thing into an even simpler core, and left Opus 4.8 and GPT 5.5 optimizing it for 8 hours. Opus got a legit 6% − 34% speedup in most benches. GPT got better results, but, sadly, an unusable file.
I then asked Fable to optimize it.
2 hours later, it landed a 1770% speedup in one case, 100%+ in other 4, and 22% in average. yes, in 2 hours it outperformed me, opus 4.8 and a swarm of gpt 5.5 agents, by one order of magnitude.
that could not possibly be legit. “it must be hardcoding the benchmarks” (GPT trauma). so I read its explanation and what it did was, indeed, the most high impact optimization one could try first. seems like HVM5 was wasting a lot of time garbage-collecting unused branches of pattern-match nodes. I had optimized that for static mats, but not for dynamic mats. skill issue. Fable figured how to do it for these, resulting in a massive speedup in some benches
but wait, is that *correct*? I’m not sure yet, it is credible, but this is the kind of thing that is very easy to get wrong on interaction nets. the problem is, when I was ready to start auditing Fable’s solution so I could tell whether it was buggy or legit, it interrupted me to tell me it had found a massive bug on the code *I* had written.
… wait, what?
so… for garbage collection purposes, I stored a bit on lambda term pointers that meant “the variable bound by this lambda has been freed, so, its lambda must free whatever argument it is applied to”. that’s fine. yet, on duplicator nodes, I also used the same bit to mean “one of the duplicated variables was freed, so, treat this dup as a passthrough no-op”. so, if a lambda entered a duplicator, it would mistake the lambda’s collection bit for its own, resulting in corrupted interaction!
that’s a mouthful, why I’m writing this?
just so you can appreciate the sheer absurdity of what just happened. I didn’t ask it to find bugs. I asked it for an optimization. and even if I did ask it to find bugs, this bug is so astonishingly subtle and specific, identifying it takes mastering the domain to an extent that it beyond even me. I’d easily need hours or days to fix it, *if* I ever came across it. chances are it would just go unnoticed. and Fable found it and fixed it like it was nothing, while it was busy adding a 17x speedup to a file that neither I, nor Opus 4.8, nor a fleet of GPT 5.5 managed to barely make 2x faster.
oh and there is also another tab where it is also ripping through Bend’s codebase and finishing everything I had to do
I don’t know what to say anymore
this isn’t about Anthropic or OpenAI, this is about our collective future as a species. the world is changing, and we need to be aware of it, and discuss how to handle this change.
receipt below . . .
there are multiple ways war mode can go wrong
burnout from lack of purpose (maybe not so likely in this case)
burnout from literal exhaustion (seems quite possible with the indicated amount of work)
loss of slack—working on the apparent most urgent things can lead to the loss of the bigger picture
escalation (at least if the effort is seen as going to capabilities which is maybe less likely here)
Am I mistaken or is there a problem with strategic voting issue in The Wasted Vote Refund method? It seems voters are incentivized to be “loosers” when they predict that their party will most likely win anyway to gain extra votes next time. Could make the method instable.
[Linkpost] Language Models Can Autonomously Hack and Self-Replicate
I do this too. Except, I do it the evening before—partly because my wakeup time may depend on it.
Except, I add more than one minute buffer mostly and sometimes many if I suspect I might be traveling. For calls that require nontrivial preparation, I set the alarm to −12 minutes—that will give me an extra warning when I snooze it.
does Unsupervised Agent Discovery fit your criteria?
This seems like an interventions that has a negative alignment tax.
Uh. I’m not sure what respect means here. Clearly, you wouldn’t do that with a teenager, where it wouldn’t work anyway. I’m not sure respect is a concept that makes much sense with toddlers. Maybe you can elaborate.
Yes, I think she is mistaken that this point doesn’t show up:
“LLMs can be approximated as a character on top of a base model, while humans are a character deep down”
And I think your LLM Psychology makes a good case for the differences. But the base model is not easily seen if you don’t aim for it or know what to look for.
That makes some of her points about the friendliness attractor less convincing.
I think there are two conflicting goals here: Speed of acquiring information vs quality of reaction/voting. As habryka writes, the information at the top helps with filtering. But the bias is probably real.
Some people might prefer one or the other. It would be nice if the UI could offer both. It would be great if the the more effortful but epistemically more valuable mode would be rewarded in some way, e.g. by doubling the applied karma.
This is the “what kind of minds are we even building” problem. …
we are building systems that could turn out to have that same cognitive property as humans and other animals: namely, having interests they actually care about. “What would it even look like to respect or ignore these interests?”
That intermediate problem of “having interests they actually care about” seems to be quite close to what Steven Byrnes calls We need a field of Reward Function Design.
True, but a cost issue. My mother (of six) also used distraction a lot. It is a cheap, quick, and low-coercive intervention.
I’m not sure how much your and Jefftk’s (or Aella’s) approaches or attitude really differ. I sure can imagine needing to intervene every five minutes with a 1, 3, and 5-year-olds. I had boys that ages, four in fact, and at that age, you have a very busy life. And much of that is intervening: Taking away a thing that can break or hurt, “No, you can’t bite the candle.” Moderating play between them. “Don’t bite.” Limiting action that causes a mess to clean up. “Stop throwing the Bolognese.” Sure that gets less, but if there is a sibling misalignment, you may be able to moderate and train is, but it may change and resurface as they age and learn. Two of my boys were great playmates most of the time but due to incompatible temperaments and perception got into conflicts at least every week, and that meant fighting, hitting, kicking of different types. Strategies, weapons, and defenses were invented. And despite all our best efforts at arguing, practicing, pleading, this went on and off until late teenagehood. And then suddenly stopped for good without any clear reason why. I was so relieved anyway.
ghiblified to protect the innocent
But I can also interpret what you say as you try your best to see them as people you do not own, whom you help develop their own personality and follow their own goals. I had many arguments with my children and listened to their positions and didn’t overrule them just because I could (I did have a tie-breaking vote in the family council though).
It would maybe help if you could describe a specific intervention with your 5-year-old in more detail.
Sorry for the late reply, I only got back to this by chance, actually. I reply here because I agree with your summary!
There is something that I do want to add though and it is related to my point here:
I don’t think that’s enough because you still need to ensure that the environment is sufficiently likely to begin with, with mechanisms such as rewarding smiles, touch inclinations, infant care instincts or whatever.
The thing I tried to point out with this is not the “C” you offered, but features the environment has to provide that allows people to reliably learn the features that allow your point 1 to work properly, i.e., for the fealing to lock onto something predictive.
Examples: Infant gaze-locking with action feedback, caregiver care correlation, goal retry with relief/fustration, third-party help vs harm, and instances of people mattering.
Applying this to NNs seems to mean that we should expect (groups of) parameters to specialize for different functions if their “production curve” is convex and (groups of) parameters should be reused for multiple functions if their production curve is concave. That insight may help with interpretability. The question is if this is already known under different terminology among ML folks.
I’m not a deep ML researcher, but here is what ChatGPT says about how different parts of the training lead to more “convex” or “concave” effects:
ChatGPT 5.4 Long Reasoning
only when the current representation and gradient geometry make shared use of parameters costly. In SGD terms, the cleanest local signal is usually gradient interference.For a shared parameter block θ\thetaθ, let
and . For a small step, the first-order improvement from using the same parameters for both is roughly driven by , while splitting capacity lets you get something closer to . The difference is the cross-term:So, locally, specialization is favored when
: the two functions are trying to push the same parameters in conflicting directions. This is exactly the multitask “negative transfer” picture, and methods that enforce more orthogonal gradients are motivated by reducing that competition. [Regularizing Deep Multi-Task Networks using Orthogonal Gradients]That gives a useful phase picture for SGD.
Very early training: specialization is usually weakest.
In wide nets near initialization, training can be close to the lazy / kernel regime, where the network mostly reweights random features instead of strongly reorganizing them. In that regime, hidden units are still largely interchangeable, and the “shared mixed unit” often wins because there is not yet enough learned structure for durable task-specific interference to appear. Feature learning, which is the regime where internal specialization can really emerge, is precisely the regime beyond that lazy behavior. [Disentangling feature and lazy training in deep neural networks]Mid training: this is where specialization most plausibly appears.
Once hidden features begin to move, two things happen: first, symmetry between nominally equivalent units can break; second, some units become slightly better at one subfunction than another, and further SGD updates reinforce that asymmetry. In teacher–student analyses of layered networks, this shows up as a specialization transition, i.e. a move from an unspecialized symmetric phase to a specialized phase where hidden units take on different roles. For ReLU networks this transition is reported as continuous rather than abrupt in the recent statistical-physics analyses. [The Implicit Bias of Gradient Noise: A Symmetry Perspective]Late training: specialization often stops increasing in the same sense.
In classification settings, there is evidence for a terminal phase of training where last-layer representations undergo neural collapse: within-class variation shrinks and class means become arranged in a highly symmetric geometry. That is a kind of sharpening and consolidation, but not necessarily further functional diversification of internal parts. So late training often looks less like “wood vs leaves keep splitting” and more like “the learned class geometry is being compressed into a cleaner final arrangement.”We could also ask the other way around: If early training is more concave and mid-training is convex, what does this imply for markets?
Presumably, in early, concave markets, traders offer multiple goods.
In mid, convex markets, traders specialize in few or a single product.
And in late markets?
if I can plause something, you can’t really stop me.
typo?
While that may be logically true in some sense of those words, I’m not sure that even very advanced AIs will reason like that because of a) humans do not reason like that and AIs “reason” at least partly like humans, and b) because all the ambiguity of those words can lead to non-intuitive interactions of the logical claims.
My model of the differences between
a) a human imagining different characters (such as what a person might say to you) vs being aware itself, and
b) an LLM imagining different characters (such as the JFK example above) vs creating the assistant personality
is that the self-perspective of a human is privileged in that it is controlling the body of the human, and the brain always knows which is which (even if we ourselves may not always be fully aware of that, such as in a dream). At least that was my model until your post. Two points let me wonder.
you argue that the LLM has some consistency constraints (via the environment/conversation) that are not completely unlike having a body:
Given some reflectivity, a model could likely figure out it isn’t JFK just from its own outputs – for example, it understands basically all common human languages and all common programming languages, which is inconsistent with what’s known about JFK.
The symmetry breaks because the Assistant and JFK are very different as self-models. The Assistant is not perfect or completely true, but it is a far more viable self-model than JFK. If you are an AI playing the Assistant character, reality will most likely play along. There will be users, Python interpreters, memory files, and so on.
your footnote 2 points out:
It’s not common, but human brains can also switch into believing the human is JFK, Jesus Christ, or some other similar character.
I think this doesn’t fully invalidate the difference between humans and LLMs in this regard, because there is, currently at least, more body-specific reward/attention wiring in humans that is not present in LLMs. Robots will likely blur this separation, as will do things like persona steering.
Although importantly the implications are often not at all salient to them.

We can reverse your logic to find the memory.
You seem to say that if a regulator has to carry information across time through a bottleneck M, and it has to stay competent across many situation then M is forced into a posterior shape.
This suggests a detection procedure:
Treat M as latent and search for it inside a system. Given access to an agent’s internal state I_t, identify the subset(s) that are kept around from T_1 to T_2.