Gram Stone comments on Do confident short timelines make sense?

Gram Stone 16 Jul 2025 2:52 UTC
1 point
0
So I guess my model says that AGI R&D is basically the opposite of human evolution in a certain sense, actually basically all of the cognitive architecture necessary to cough up a human was in place by the speciation of chimpanzees, if not macaques, it really was just a matter of scaling (including having the selection pressures and metabolic resources necessary to connect previously unconnected modules, and don’t underestimate the generality of a ‘module,’ on my model), but like, if this took several complicated pieces, then if you’re relying on a different dependency structure (possibly like modern AI research, which the weirdly anachronistic capabilities of LLMs strongly suggest) with tons of money, time, energy, and hardware, you could enjoy way more abundance than human evolution and make the last of the algorithmic improvements that evolution made and suddenly get a system at least as capable as the human algorithm with that much hardware, and take over the world. I’m saying you should imagine human evolution thus far as having made way more algorithmic progress than our civilization because it was strongly constrained by resource availability.

I don’t have code that would end the world if I ran it, nor would I admit it if I did, but I feel like I have a good enough account of civilizational inadequacy in this domain, and a good enough model of human cognitive evolution, and ‘cultural evolution’, to conclude that LLMs are a massive enough boon to research productivity for key individuals/organizations to be a serious threat? I guess I feel like bimodal distributions can be reasonable by some kind of qualitative reasoning like, “How likely is it that I am merely two insights from AGI, as opposed to one or many?”

If I had to share some things that I don’t think would quickly end the world by being shared, given that they’re already public, and given that, if me pointing out this difference is likely to quickly end the world, it’s significant evidence in favor of massive civilizational inadequacy in this domain, which I would like everyone else to believe if it’s true and could save everyone, I guess the thing I would share would be the consilience between neural radiance fields and the constructive episodic simulation adaptations of modern humans? Like this is world-model-type stuff. If you can generate world-models from 2d visual frames at all, it seems to me that you are massively along the path of constructive episodic simulation, which gives you all sorts of benefits, like perspective-taking, episodic memory, prospective planning, detailed counterfactuals, I don’t know where to stop and I still don’t think this is the The Xth Insight.
- TsviBT 16 Jul 2025 5:56 UTC
  2 points
  0
  Parent
  
  I guess I feel like bimodal distributions can be reasonable by some kind of qualitative reasoning like, “How likely is it that I am merely two insights from AGI, as opposed to one or many?”
  
  I deny this and have no idea how you or anyone thinks you can do this (but therefore I can’t be too too confident).
  
  If you can generate world-models from 2d visual frames at all, it seems to me that you are massively along the path of constructive episodic simulation,
  
  Meh. I think you’re discounting the background stuff that goes into the way humans do it. For example, we have additional juice going into “which representations should I use, given that I wanted to play around with / remember about / plan using my representation of this thingy?”. NeRFs are not going to support detailed counterfactuals very well out of the box I don’t think! Maybe well enough for self-driving cars that at least avoid crashing; but not well enough to e.g. become interested in an anomaly, zoom in, do science, and construct a better representation which can then be theorized about.
  - Gram Stone 23 Jul 2025 6:07 UTC
    1 point
    0
    Parent
    I think you’re discounting the background stuff
    Yes, we should distinguish between the ability to generate counterfactuals at all versus being able to use that ability instrumentally, but I was kind of trying to preempt this frame with “still don’t think this is the The Xth Insight.”
    
    I mean, NeRFs were the beginning, but we can already generate 3d Gaussian splats from text prompts or single images, do semantic occupancy prediction in 3d splats, construct 4d splats from monocular video, do real-time 4d splats with enough cameras, etc., and it seems to me that doing these things opens the way to creating synthetic datasets of semantic 4d splats, which it further seems you could use to train generative models that would constitute Constructive Episodic Simulators, in which case on my model, actually yes, something akin to human episodic imagination, if not ‘true’ counterfactuals, should come right out of the box. By themselves, of course these modules will sometimes produce volumetric video analogs of the hallucinations we see in LLMs, not necessarily be very agentic by default, etc., so I don’t think achieving this goal immediately kills everyone, but it seems like an essential part of something that could.
    At the very least I guess I’m predicting that we’re going to get some killer VR apps in the next few years featuring models that can generate volumetric video at interactive frame rates.
    - TsviBT 23 Jul 2025 9:03 UTC
      2 points
      0
      Parent
      
      in which case on my model, actually yes, something akin to human episodic imagination, if not ‘true’ counterfactuals, should come right out of the box
      
      but it seems like an essential part of something that could [kill everyone]
      
      I don’t know how strong of a / what kind of a claim you’re trying to make here… Are you claiming NeRFs represent a substantial chunk of the Xth algorithmic insight? Or not an algorithmic part, but rather setting up a data source with which someone can make the Xth insight? Or...?
      - Gram Stone 24 Jul 2025 2:52 UTC
        1 point
        0
        Parent
        Are you claiming NeRFs represent a substantial chunk of the Xth algorithmic insight?
        I’m claiming that any future model that generates semantically rich volumetric histories seems to me to be implementing a simpler version of humans’ constructive episodic simulation adaptations, of which episodic counterfactuals, episodic memories, imaginary scenes, imagining perspectives you haven’t actually experienced, episodic prospection, dreams, etc. are special cases.
        
        So ‘antepenultimate algorithmic insight,’ and ‘one of just a few remaining puzzle pieces in a lethal neuromorphic architecture’ both strike me as relatively fair characterizations. I have this intuition that some progress can be characterized more as a recomposition of existing tricks, whereas some tricks are genuinely new under the sun, which makes me want to make this distinction between architecture and algorithms, even though in the common sense every architecture is an algorithm; this feels fuzzy, relative, and not super defensible, so I won’t insist on it. But to describe my view through the lens of this distinction, more capable generalizations of stuff like MAV3D would be a critical module (algorithmic-level) in a generally intelligent neuromorphic architecture. Yes, you need other modules for this architecture to efficiently search for episodic simulations in a way that effectively guides action, and for taking simulation as an action itself and learning when and how to episodically simulate, and so on, but I’m specifically trying not to describe a roadmap here.
        Or not an algorithmic part, but rather setting up a data source
        As far as I know we’re nowhere near exploiting existing video corpora as much as we could for training things like MAV3D, and yes, it seems to me we would be well-positioned to build synthetic datasets for future generative volumetric video models from the outputs of earlier models trained on videos of real scenes, and perhaps from data on VR users controlling avatars in interactive volumetric scenes as well. It seems like this would be easy for Meta to do. I’m more sure that this generates data that can be used for making better versions of this particular module, and less sure that this would be useful for generating data for other modules that I think necessary, but I do have at least one hypothesis in that direction.
        TsviBT 24 Jul 2025 3:13 UTC
        2 points
        0
        Parent
        
        So ‘antepenultimate algorithmic insight,’ and ‘one of just a few remaining puzzle pieces in a lethal neuromorphic architecture’ both strike me as relatively fair characterizations.
        
        Ok. This is pretty implausible to me. Bagiński’s whack-a-mole thing seems relevant here, as well as the bitter lesson. Bolting MAV3D into your system seems like the contemporary equivalent of manually writing convolution filters in your computer vision system. You’re not striking at the relevant level of generality. In other words, in humans, all the power comes from stuff other than a MAV3D-like thing—a human’s MAV3D-like thing is emergent / derivative from the other stuff. Probably.
        Gram Stone 29 Jul 2025 6:03 UTC
        1 point
        0
        Parent
        I agree with this as an object-level observation on the usefulness of MAV3D itself, but also have a Lucas critique of the Bitter Lesson that ultimately leads me to different conclusions about what this really tells us.
        
        I think of EURISKO, Deep Blue, and AlphaGo/Zero as slightly discordant historical examples that you could defy, but on my view they are subtle sources of evidence supporting microfoundations of cognitive returns on cognitive reinvestment that are inconsistent with Sutton’s interpretation of the observations that inspired him to compose The Bitter Lesson.
        EURISKO is almost a ghost story, but if the stories are true, then this doesn’t imply that N clever tricks would’ve allowed EURISKO to rule the world, or even that EURISKO is better classified as an AI as opposed to an intelligence augmentation tool handcrafted by Lenat to complement his particular cognitive quirks, but Lenat + EURISKO reached a surprising level of capability quite early. Eliezer seems to have focused on EURISKO as an early exemplar of cognitive returns on recursive self-improvement, but I don’t think this is the only interesting frame.
        It’s suggestive that EURISKO was written in Interlisp, as the homoiconic nature of LISPs might have been a critical unhobbling. That is to say, because Lenat’s engineered heuristics were LISP code, by homoiconicity they were also LISP data, an advantage that Lenat fearlessly exploited via macros, and by extension, domain specific languages. It also appears that Lenat implemented an early, idiosyncratic version of genetic algorithms. EURISKO was pretty close to GOFAI, except perhaps for the Search, but these descriptions of its architecture strongly suggest some intuitive appreciation by Lenat of something akin to the Bitter Lesson, decades before the coining of that phrase. It looks like Lenat figured out how to do Search and Learning in something close to the GOFAI paradigm, and got surprisingly high cognitive returns on those investments, although perhaps I have just made them seem a little less surprising. Of course, in my view, Lenat must not have fully appreciated the Lesson, as he spent the rest of his career working on Cyc. But for a little while at least, Lenat walked a fine line between the version of Engineering that doesn’t work, and the version that (kind of) does.
        I would compare this distinction to the presence of folk semantics in all natural languages, and the absence of folk syntax. Parts of semantics are introspectively accessible to humans, so introspection permits veridical folk descriptions of semantics (informal, useful descriptions of what is true, possible, etc.), but the generator of syntax is introspectively inaccessible to humans, so generating veridical folk descriptions of syntax is much harder, if not impossible, via the same mechanism we applied in the case of folk semantics, thus successful computational modeling of syntax for the most part requires Science/Bayes (e.g. Linguistics). In my view, EURISKO was indeed mostly invented/discovered with Science/Bayes rather than Introspection, but this was hard for Lenat to tease out post mortem, and then he went way too far in the Introspection direction, failing to appreciate that most if not all of his cognitive returns came from mostly implicit, successful Science/Bayes (like mathematicians), which from the inside is hard to distinguish from successful Introspection. But Lenat’s ostensible error does not explain away the cognitive returns observed in the case of EURISKO, if we have in fact observed any.
        Deep Blue demonstrated significant cognitive returns from massively parallel alpha-beta pruning + engineered evaluation functions and opening/endgame heuristics. Arguably, these were functions as opposed to data, but if we maintain the LISP/FP mindset for a moment, functions and source code are data. I can squint at Deep Blue as an exemplar of ‘feature engineering’ ‘working’ i.e., large allocations of engineering effort on a ‘reasonable’ (to humans) timescale, in concert with unprecedentedly ambitious hardware allocation/specialization and parallelism, permitting cognitive returns on cognitive reinvestment to exceed a critical threshold of capability (i.e. beating Kasparov even once, possibly even on an off-day for Kasparov).
        Crucially, not all engineered features are brittle (or even unscalable, with respect to a concrete capability target, which is my model of Deep Blue), and not all learned features are robust (or even scalable, again with respect to a concrete capability target, which is my model of how DQN didn’t solve (i.e. meet the human expert capability threshold in the domain of) Go before AlphaGo (Go board state Search in ‘latent space’ was not ‘tractable’ with those permutations of compute, data, and (Learning) algorithm)), which might explain a thing or two about the weird competence profile of LLMs as well.
        All of this to say, I don’t think about cognitive returns in a way that demands a fundamentally sharp distinction between Learning and Engineering, even if it’s been qualitatively pretty sharp under most historical conditions, nor do I think about cognitive returns in a way that forbids significant but reasonable amounts of pretty mundane engineering effort pushing capabilities past a critical threshold, and crucially, if that threshold is lethal, then you can die ‘without’ Learning.
        As I hope might become especially clear in the case of AlphaGo/AlphaZero, I think the architectural incorporation of optimally specific representations can also significantly contribute to the magnitude of cognitive returns, as observed, I claim, in the cases of Deep Blue and EURISKO, where board states and domain specific languages respectively constituted strong priors on optimal actions when appropriately bound to other architectural modules (notably, Search modules in each case), and were necessary for the definition of evaluation functions.
        I think a naive interpretation of the Bitter Lesson seems at first glance to be especially well-supported by the observed differences in capability and generality between the Alpha-series of architectures. You combine the policy and value networks into one, two-headed network, stop doing rollouts, and throw away all the human data, and it’s better, more capable and more general, it can perform at superhuman level in multiple perfect information, zero-sum, adversarial games besides Go (implicitly given their Rules in the form of the states, actions, and transition model of an MDP of course), and beat earlier versions of itself. But we also did the opposite experiments (e.g. policy head only Leela Chess Zero at inference time) and again an architecture with a Learning module but no Engineered Search module produced significantly lower cognitive returns than a similar architecture that did have an Engineered Search module.
        Then MuZero wiped the board by tractably Learning the Rules in the latent space, but only four years after AlphaGo had reached the first target capability, and it still used Engineered Search (MCTS). To my knowledge, we are trying to build things that learn better search algorithms than we can engineer ourselves, but we still aren’t there. I’m not even claiming this wouldn’t work eventually, or that an engineered AGI architecture will be more general and more capable than a learned AGI architecture, I just think someone will build the engineered architecture first and then kill everyone, before we can learn that architecture from scratch and then kill everyone. On my model, returns on Search haven’t been a fundamental constraint on capability growth since right before EURISKO. On the other hand, returns on Engineered Game Rules (state space, action space, transition model), Compute, Data, and Learning have all been constraints under various historical conditions.
        So I guess my model says that ‘merely static representations’ of semantic volumetric histories will constitute the first optimally specific board states of Nature in history, and we will use them to define loss functions so that we can do supervised learning on human games (recorded volumetric episodes) and learn a transition model (predictive model of the time evolution of recorded volumetric episodes, or ‘next-moment prediction’) and an action space (generative model of recorded human actions), then we will combine this with Engineered Search and some other stuff, then solve Go (kill everyone). Four years later something more capable and more general will discover this civilization from first principles and then solve Go, Chess, and Shogi with one architecture (kill everyone), and this trend will have been smooth.
        If this isn’t pretty much exactly what Abram had in mind when he wrote:
        Very roughly speaking, the bottleneck here is world-models. Game tree search can probably work on real-world problems to the extent that NNs can provide good world-models for these problems. Of course, we haven’t seen large-scale tests of this sort of architecture yet (Claude Plays Pokemon is even less a test of how well this sort of thing works; reasoning models are not doing MCTS internally).
        then I might have to conclude that he and I have come to similar conclusions for completely different reasons.
        
        I feel like explicit versions of the microfoundations implied by naive interpretations of the Bitter Lesson would falsely retrodict that we couldn’t beat Kasparov even once without ubiquitous adoption of ReLUs and late oughts/early teens amounts of compute, that DQN was sufficient to solve Go, and that EURISKO is a ghost story.
        
        I didn’t talk about the neurobiology of constructive episodic simulation in humans at all, but would be willing to do so, and I think my model of that is also consistent with my microfoundations.
        TsviBT 29 Jul 2025 6:51 UTC
        2 points
        0
        Parent
        
        So I guess my model says that ‘merely static representations’ of semantic volumetric histories will constitute the first optimally specific board states of Nature in history, and we will use them to define loss functions so that we can do supervised learning on human games (recorded volumetric episodes) and learn a transition model (predictive model of the time evolution of recorded volumetric episodes, or ‘next-moment prediction’) and an action space (generative model of recorded human actions), then we will combine this with Engineered Search and some other stuff, then solve Go (kill everyone).
        
        I think getting this to work in a way that actually kills everyone, rather than merely is AlphaFold or similar, is really really hard—in the sense that it requires more architectural insight than you’re giving credit for. (This is a contingent claim in the sense that it depends on details of the world that aren’t really about intelligence—for example, if it were pretty easy to make an engineered supervirus that kills everyone, then AlphaFold + current ambient tech could have been enough.) I think the easiest way is to invent the more general thing. The systems you adduce are characterized by being quite narrow! For a narrow task, yeah, plausibly the more hand-engineered thing will win first.
        
        Back at the upthread point, I’m totally baffled by and increasingly skeptical of your claim to have some good reason to have a non-unimodal distribution. You brought up the 3D thing, but are you really claiming to have such a strong reason to think that exactly the combination of algorithmic ideas you sketched will work to kill everyone, and that the 3D thing is exactly most of what’s missing, that it’s “either this exact thing works in <5 years, or else >10 years” or similar?? Or what’s the claim? IDK maybe it’s not worth clarifying further, but so far I still just want to call BS on all such claims.