aysja comments on Agent foundations: not really math, not really science

aysja 21 Aug 2025 10:31 UTC
15 points
3
Empirics reigns, and approaches that ignore it and try to nonetheless accomplish great and difficult science without binding themselves tight to feedback loops almost universally fail.
Many of our most foundational concepts have stemmed from first principles/philosophical/mathematical thinking! Examples here abound: Einstein’s thought experiments about simultaneity and relativity, Szilard’s proposed resolution to Maxwell’s demon, many of Galileo’s concepts (instantaneous velocity, relativity, the equivalence principle), Landauer’s limit, logic (e.g., Aristotle, Frege, Boole), information theory, Schrödinger’s prediction that the hereditary material was an aperiodic crystal, Turing machines, etc. So it seems odd, imo, to portray this track record as near-universal failure of the approach.
But there is a huge selection effect here. You only ever hear about the cool math stuff that becomes useful later on, because that’s so interesting; you don’t hear about stuff that’s left in the dustbin of history.
I agree there are selection effects, although I think this is true of empirical work too: the vast majority of experiments are also left in the dustbin. Which certainly isn’t to say that empirical approaches are doomed by the outside view, or that science is doomed in general, just that using base rates to rule out whole approaches seems misguided to me. Not only because one ought to choose which approach makes sense based on the nature of the problem itself, but also because base rates alone don’t account for the value of the successes. And as far as I can tell, the concepts we’ve gained from this sort of philosophical and mathematical thinking (including but certainly not limited to those above) have accounted for a very large share of the total progress of science to date. Such that even if I restrict myself to the outside view, the expected value here still seems quite motivating to me.
- sunwillrise 21 Aug 2025 12:26 UTC
  8 points
  5
  Parent
  Many of our most foundational concepts have stemmed from first principles/philosophical/mathematical thinking
  Conflating “philosophy” and “mathematics” is another instance of the kind of sloppy thinking I’m warning against in my previous comment.
  The former^[1] is necessary and useful, if only because making sense of what we observe requires us to sit down and peruse our models of the world and adjust and update them. And also because we get to generate “thought experiments” that give us more data with which to test our theories.^[2]
  The latter, as a basic categorical matter, is not the same as the former. “Mathematics” has a siren-like seduction quality to those who are mathematically-inclined. It comes across, based not just on structure but also on vibes and atmosphere, as giving certainty and rigor and robustness. But that’s all entirely unjustified until you know the mathematical model you are employing it actually useful for the problem at hand.
  So it seems odd, imo, to portray this track record as near-universal failure of the approach.
  Of what approach?
  Of the approach that “it’s hard to even think of how experiments would be relevant to what I’m doing,” as Alex Altair wrote about above? The only reason all those theories you mentioned before ultimately obtained success and managed to be refined into something closely approximated reality is because after some initial, flawed versions of them were proposed, scientists looked very hard at experiments to verify them, iron out their flaws, and in some situations throw away completely mistaken approaches. Precisely the type of feedback loop that’s necessary to do science.
  This approach, that the post talks about, has indeed failed universally.
  I agree there are selection effects, although I think this is true of empirical work too: the vast majority of experiments are also left in the dustbin.
  Yes, the vast majority of theories and results are left in the dustbin after our predictions make contact and are contrasted with our observations. Precisely my point. That’s the system working as intended.
  Which certainly isn’t to say that empirical approaches are doomed by the outside view
  … what? What does this have to do with anything that came before it? The fact that approaches are ruled out is a benefit, not a flaw, of empirics. It’s a feature, not a bug. It’s precisely what makes it work. Why would this ever say anything negative about empirical approaches?
  By contrast, if “it’s hard to even think of how experiments would be relevant to what I’m doing,” you have precisely zero means of ever determining that your theories are inappropriate for the question at hand. For you can keep working on and living in the separate magisterium of mathematics, rigorously proving lemmas and theorems and result with the iron certainty of mathematical proof, all without binding yourself to what matters most.
  Not only because one ought to choose which approach makes sense based on the nature of the problem itself
  Taking this into account makes agent foundations look worse, not better.
  As I’ve written about before, the fundamental models and patterns of thought embedded in these frameworks were developed significantly prior to Deep Learning and LLM-type models taking over. “A bunch of models that seem both woefully underpowered for the Wicked Problems they must solve and also destined to underfit their target, for they (currently) all exist and supposedly apply independently of the particular architecture, algorithms, training data, scaffolding etc., that will result in the first patch of really powerful AIs,” as I said in that comment. The bottom line was written down long before it was appropriate to do so.
  but also because base rates alone don’t account for the value of the successes
  And if I look at what agent foundations-type researchers are concluding on the basis of their purely theoretical mathematical vibing, I see precisely the types of misunderstandings, flaws, and abject nonsense that you’d expect when someone gets away with not having to match their theories up with empirical observations.^[3]
  Case in point: John Wentworth claiming he has “put together an agent model which resolved all of [his] own most pressing outstanding confusions about the type-signature of human values,” when in fact many users here have explained in detail^[4] why his hypotheses are entirely incompatible with reality.^[5]
  Such that even if I restrict myself to the outside view, the expected value here still seems quite motivating to me.
  I don’t think I ever claimed restricting to the outside view is the proper thing to do here. I do think I made specific arguments for why it shouldn’t feel motivating.
  1. ^
    Which, mind you, we barely understand at a mechanistic/rigorous/”mathematical” level, if at all
  2. ^
    Which is what the vast majority of your examples are about
  3. ^
    And also the kinds of flaws that prevent whatever results are obtained from actually matching up with reality, even if the theorems themselves are mathematically correct
  4. ^
    See also this
  5. ^
    And has that stopped him? Of course not, nor do I expect any further discussion to. Because the conclusions he has reached, although they don’t make sense in empirical reality, do make sense inside of the mathematical models he is creating for his Natural Abstractions work. This is reifying the model and elevating it over reality, an even worse epistemic flaw than conflating the two.
    The one time he confessed he had been working on “speedrun[ning] the theory-practice gap” and creating a target product with practical applicability, it failed. Two years prior, he had written “Note that “theory progressing faster than expected, practice slower” is a potential red flag for theory coming decoupled from reality, though in this case the difference from expectations is small enough that I’m not too worried. Yet.” But he didn’t seem all that worried now either.
  - Alex_Altair 21 Aug 2025 23:38 UTC
    13 points
    1
    Parent
    By contrast, if “it’s hard to even think of how experiments would be relevant to what I’m doing,” you have precisely zero means of ever determining that your theories are inappropriate for the question at hand.
    Here, you’ve gotten too hyperbolic about what I said. When I say “experiments”, I don’t mean “any contact with reality”. And when I said “what I’m doing”, I didn’t mean “anything I will ever do”. Some people I talk to seem to think it’s weird that I never run PyTorch, and that’s the kind of thing where I can’t think of how it would be relevant to what I’m currently doing.
    When trying to formulate conjectures, I am constantly fretting about whether various assumptions match reality well enough. And when I do have a theory that is at the point where it’s making strong claims, I will start to work out concrete ways to apply it.
    But I don’t even have one yet, so there’s not really anything to check. I’m not sure how long people are expecting this to take, and this difference in expectation might be one of the implicit things driving the confusion. As many theorems there are that end up in the dustbin, there is even more pre-theorem work that end up in the dustbin. I’ve been at this for three and change years, and I would not be surprised if it takes a few more years. But the entire point is to apply it, so I can certainly imagine conditions under which we end up finding out whether the theory applies to reality.