Matthias Dellago’s Shortform

Matt Dellago18 Dec 2024 17:21 UTC

1 point

25 comments1 min readLW link

Matt Dellago 24 Oct 2025 18:02 UTC
12 points
−2
Maximally coherent agents are indistinguishable from point particles. They have no internal degrees of freedom, one cannot probe their internal structure from the outside.

Epistemic Status: Unhinged
- 1a3orn 24 Oct 2025 22:39 UTC
  12 points
  1
  Parent
  You know who else is completely simple inside, with no internal degrees of freedom, and always wills the same thing eternally unceasingly?
  
  Yeah that’s right, the Medieval Catholic Scholastic God.
  - Jesper L. 25 Oct 2025 10:07 UTC
    3 points
    0
    Parent
    This is not a trivial point to make. Good callout.
- testingthewaters 24 Oct 2025 22:40 UTC
  4 points
  0
  Parent
  In video games this is made literal by every entity having a central coordinate. Their body is merely a shell wrapped around the point-self and a channel for the will of the external power (the player).
Matt Dellago 18 Dec 2024 16:45 UTC
12 points
1
Scale invariance is itself an emergent phenomenon.

Imagine scaling something (say a physical law) up—if it changes, it is obviously not scale invariant as it will continue changing with each scale up. If it does not change it has reached a fixed point and will not change in the next scale up either!
Scale invariances are just fixed points of coarse-graining.
Therefore, we should expect anything we think of as scale invariant to break down at small scales. For instance, electric charge is not scale invariant at small scales!
In the opposite direction: We should expect our physical laws to continue holding for the macro scale, if they are fixed points of scaling. This also explains the ubiquity of power laws in the natural sciences; power laws are the only relations that are scale invariant and thus preserved!
All of this may seem tautological but is actually truly strange. To me this indicates that we should expect to be very, very far from the actual substrate of the universe.

Now go forth and study renormalisation group flow! ;)
Epistemic status: Just riffing!
- Nathan Helm-Burger 18 Dec 2024 21:11 UTC
  3 points
  0
  Parent
  This sounds like a fascinating insight, but I think I may be missing some physics context to fully understand.
  
  Why is it that the derived laws approximating a true underlying physical law are expected to stay scale invariant over increasing scale after being scale invariant for two steps? Is there a reason that there can’t be a scale invariant region that goes back to being scale variant at large enough scales just like it does at small enough scales?
  - Matt Dellago 19 Dec 2024 2:43 UTC
    4 points
    0
    Parent
    The act of coarse-graining/scaling up (RG transformation) changes the theory that describes the system, specifically the theories parameters. If you consider in the space of all theories and iterate the coarse-graining, this induces a flow where each theory is mapped to a coarse-grained version. This flow may posess attractors, that is stable fixed points x*, meaning that when you apply the coarse-graining you get the same theory back.
    And if f(x*)=x* then obviously f(f(x*))=x*, i.e. any repeated application will still yield the fixed point.
    So you can scale up as much as you want—entering a fixed point really is a one way street, you can can check out any time you like but you can never leave!
- Noosphere89 18 Dec 2024 17:55 UTC
  3 points
  0
  Parent
  The main source of scale-invariance itself probably would have to do with symmetry meaning that an object has a particular property that is preserved across scales.
  
  Space symmetry is an example, where the basic physical laws are preserved across all scales of spacetime, and in particular means that scaling a system down doesn’t mean different laws of physics apply at different scales, there is only 1 physical law, which produces varied consequences at all scales.
  - Matt Dellago 19 Dec 2024 3:00 UTC
    1 point
    0
    Parent
    You’re making an interesting connection to symmetry! But scale invariance as discussed here is actually emergent—it arises when theories reach fixed points under coarse-graining, rather than being a fundamental symmetry of space. This is why quantities like electric charge can change with scale, despite spacetime symmetries remaining intact.
    And while spacetime symmetries still seem scale invariant, considering the above argument they might also break down at small scales. It seems exceedingly unlikely that they would not! The initial parameters of the theory would have to be chosen just so as to be a fixed point. It seems much more likely that these symmetries emerged through RG flow rather than being fundamental.
    - Noosphere89 19 Dec 2024 4:40 UTC
      3 points
      0
      Parent
      
      And while spacetime symmetries still seem scale invariant, considering the above argument they might also break down at small scales. It seems exceedingly unlikely that they would not! The initial parameters of the theory would have to be chosen just so as to be a fixed point. It seems much more likely that these symmetries emerged through RG flow rather than being fundamental.
      
      While this is an interesting idea, I do still think space symmetries are likely to remain fundamental features of physics, rather than being emergent out of some other process.
      - Matt Dellago 19 Dec 2024 16:09 UTC
        1 point
        0
        Parent
        I’ll bet you! ;)
        Sadly my claim is somewhat unfalsifiable because the emergence might always be hiding at some smaller scale, but I would be surprised if we find the theory that the standard model emerges from and it’s contains classical spacetime.
        I did a little search, and if it’s worth anything Witten and Wheeler agree: https://www.quantamagazine.org/edward-witten-ponders-the-nature-of-reality-20171128/ (just search for ‘emergent’ in the article)
        Noosphere89 19 Dec 2024 16:23 UTC
        3 points
        0
        Parent
        Can you have emergent spacetime while space symmetry remains a bedrock fundamental principle, and not emergent of something else?
        Matt Dellago 19 Dec 2024 16:48 UTC
        1 point
        0
        Parent
        I don’t know if that is a meaningful question.
        Consider this: a cube is something that is symmetric under the octahedral group—that’s what *makes* it a cube. If it wasn’t symmetric under these transformations, it wouldn’t be a cube. So also with spacetime—it’s something that transforms according to the Poincaré group (plus some other mathematical properties, metric etc.). That’s what makes it spacetime.
        Noosphere89 19 Dec 2024 16:54 UTC
        2 points
        0
        Parent
        So space symmetry is always assumed when we talk about spacetime, and if space symmetry didn’t hold, spacetime as we know it would not work/exist?
- Matt Dellago 18 Dec 2024 17:44 UTC
  3 points
  0
  Parent
  As a corollary: Maybe power laws for AI should not surprise us, they are simply the default outcome of scaling.
Matt Dellago 13 Feb 2025 14:43 UTC
11 points
5
Simplified the solomonoff prior is the distribution you get when you take a uniform distribution over all strings and feed them to a turing machine.
Since the outputs are also strings: What happens if we iterate this? What is the stationary distribution? Is there even one? The fixed points will be quines, programs that copy their source code to the output. But how are they weighted? By their length? Presumably you can also have quine-cycles of programs that generate each other in turn, in a manner reminiscent metagenesis. Do these quine cycles capture all probability mass or does some diverge?

Very grateful for answers and literature suggestions.
- Kaarel 13 Feb 2025 18:13 UTC
  6 points
  1
  Parent
  A few quick observations (each with like $90 %$ confidence; I won’t provide detailed arguments atm, but feel free to LW-msg me for more details):
  - Any finite number of iterates just gives you the solomonoff distribution up to at most a const multiplicative difference (with the const depending on how many iterates you do). My other points will be about the limit as we iterate many times.
  - The quines will have mass at least their prior, upweighted by some const because of programs which do not produce an infinite output string. They will generally have more mass than that, and some will gain mass by a larger multiplicative factor than others, but idk how to say something nice about this further.
  - Yes, you can have quine-cycles. Relevant tho not exactly this: https://github.com/mame/quine-relay
  - As you do more and more iterates, there’s not convergence to a stationary distribution, at least in total variation distance. One reason is that you can write a quine which adds a string to itself (and then adds the same string again next time, and so on)^[1], creating “a way for a finite chunk of probability to escape to infinity”. So yes, some mass diverges.
  - Quine-cycles imply (or at least very strongly suggest) probabilities also do not converge pointwise.
  - What about pointwise convergence when we also average over the number of iterates? It seems plausible you get convergence then, but not sure (and not sure if this would be an interesting claim). It would be true if we could somehow think of the problem as living on a directed graph with countably many vertices, but idk how to do that atm.
  - There are many different stationary distributions — e.g. you could choose any distribution on the quines.
  ↩︎
  a construction from o3-mini-high: https://colab.research.google.com/drive/1kIGCiDzWT3guCskgmjX5oNoYxsImQre-?usp=sharing
- TsviBT 14 Feb 2025 4:03 UTC
  3 points
  0
  Parent
  Very relevant: https://web.archive.org/web/20090608111223/http://www.paul-almond.com/WhatIsALowLevelLanguage.htm
  - Matt Dellago 14 Feb 2025 10:49 UTC
    1 point
    0
    Parent
    Thank you! I’ll have a look!
Matt Dellago 16 Oct 2025 10:52 UTC
3 points
0
Coherence as Purpose

Epistemic Status: Riffing

We know coherence when we see it. A craftsman working versus someone constantly fixing his previous mistakes. A functional organization versus bureaucratic churn. A healthy body versus one fighting itself. War, internal conflict, rework: these are wasteful. We respect people who act decisively, societies that build without tearing down, systems that run clean.

This intuition points somewhere real. In some sense, maximizing/expanding coherence is what the universe does: cutting friction, eliminating waste, building systems that don’t fight themselves. Not from external design, but because coherent systems expand until they can’t. Each pocket of coherence is the universe organizing itself better. The point is that coherence captures “good”: low friction, low conflict, no self-sabotage.

I propose that this is measurable. Coherence could be quantified as thermodynamic efficiency. Pick a boundary and time window, track energy in. The coherent part becomes exported work, heat above ambient, or durable stores (raised water, charged batteries, separated materials). The rest is loss: waste heat, rework, reversals. Systems can expand until efficiency stops generating surplus. When new coordination tools raise that limit, growth resumes. Just observable flows, no goals needed.

An interesting coincidency: maximizing thermodynamic efficiency (coherence) maximally delays heat death of a system. Higher efficiency means slower entropy increase.

I am very interested in hearing counterexamples of coherent systems that are intuitively repellent!

Edit: Had a talk with a physicist: This is in fact the same as the system minimizing entropy production rate! Possibly that this could serve as a more operationally tractable (and fundamental) foundation to agency, as opposed to beliefs, goals, actions, utility etc. An a structure that, if present in a system, minimizes the rate of entropy production. i.e. maximally slows neg-entropy consumption.
- Alexander Gietelink Oldenziel 16 Oct 2025 21:31 UTC
  4 points
  0
  Parent
  So indeed if you define coherence as the negative of arbitrage value.
  There is a pretty close relation between thermodynamic free energy and arbitrgable value and degree to which an entity can be money pumped.
  You might also be interested in :
  https://www.lesswrong.com/posts/3xF66BNSC5caZuKyC/why-subagents
Matt Dellago 3 Oct 2025 8:07 UTC
3 points
0
Is there an anthropic reason or computational (solomonoff-pilled) argument for why we would expect to the computational/causal graph of the universe to be this local (sparse)? Or at least appear local to a first approximation. (Bells-inequality)

This seems like a quite special property: I suspect that ether
- it is not as rare in e.g. the solomonoff prior as we might first intuit, or
- we should expect this for anthropic resons e.g. it is really hard to develop intelligence/do precidctions in nonlocal universes.
- Mitchell_Porter 3 Oct 2025 9:44 UTC
  3 points
  0
  Parent
  In physics, it is sometimes asked why there should be just three (large) space dimensions. No one really knows, but there are various mathematical properties unique to three or four dimensions, to which appeal is sometimes made.
  I would also consider the recent (last few decades) interest in the emergence of spatial dimensions from entanglement. It may be that your question can be answered by considering these two things together.
Matt Dellago 21 Mar 2025 13:25 UTC
2 points
0
Simplicity Priors are Tautological

Any non-uniform prior inherently encodes a bias toward simplicity. This isn’t an additional assumption we need to make—it falls directly out of the mathematics.

For any hypothesis $h$ , the information content is $I (h) = - l o g (P (h))$ , which means probability and complexity have an exponential relationship: $P (h) = e^{- I (h)}$

This demonstrates that simpler hypotheses (those with lower information content) are automatically assigned higher probabilities. The exponential relationship creates a strong bias toward simplicity without requiring any special mechanisms.

The “simplicity prior” is essentially tautological—more probable things are simple by definition.
- [ ]
  [deleted]
Matt Dellago 7 Sep 2025 17:57 UTC
1 point
0
The Red Queen’s Race in Weight Space

In evolution we can tell a story that not only are genes selected for their function, but also for how easily modifiable they are. For example, having a generic antibiotic gene is much more useful than having an antibiotic locked into one target and far, in edit-distance terms, from any other useful variant.

Why would we expect the generic gene to be more common? There is selection pressure on having modifiable genes because environments are constantly shifting (the Red Queen hypothesis). Genes are modules with evolvability baked in by past selection.

Can we make a similar argument for circuits/features/modes in NNs? Obviously it is better to have a more general circuit, but can we also argue that “multitool circuits” are not only better at generalising but also more likely to be found?

SGD does not optimise loss but rather something like free energy, taking degeneracy (multiplicity) into account with some effective temperature.
But evolvability seems distinct from degeneracy. Degeneracy is a property of a single loss landscape, while evolvability is a claim about distribution shift. And the claim is not “I have low loss in the new distribution” but rather “I am very close to a low-loss solution of the new distribution.”

Degeneracy in ML ≈ mutational robustness in biology, which is straightforward, but that is not what I am pointing at here. Evolvability is closer to out-of-distribution adaptivity: the ability to move quickly into a new optimum with small changes.

Are there experiments where a model is trained on a shifting distribution?

Is the shifting distribution relevant or can this just as well be modeled as a mixture of the distributions, and what we think of as OOD is actually in the mixture distribution? In that case degeneracy is all you need.

Related ideas: cryptographic one-way functions (examples of unevolvable designs), out-of-distribution generalisation, mode connectivity.

Matthias Dellago’s Shortform

Coherence as Purpose

The Red Queen’s Race in Weight Space