Davidmanheim comments on Modeling versus Implementation

Davidmanheim 20 May 2025 6:24 UTC
LW: 5 AF: 2
1
AF
As I understand it, MIRI intended to build principled glass-box agents based on Bayesian decision theory.

I think this misunderstands the general view of agent foundations by those who worked on it in the past. That is, “highly reliable agent design” was an eventual goal, in the same sense that someone taking high-school physics wants to use it to build rockets—they (hopefully) understand enough to know that they don’t know enough, and will need to learn more before even attempting to build anything.

That’s why Eliezer talked so much about deconfusion. The idea was to figure out what they didn’t know. This led to later talking about building safe AI as an eventual goal—not a plan, but an eventual possible outcome if they could figure out enough. They clarified this view. It was mostly understood by funders. And I helped Issa Rice write a paper laying out the different pathways that it could help—and only two of those involved building agents.
And why did they give it up? Largely because they found that the deconfusion work was so slow, and everyone was so fundamentally wrong about the basics, that as LLM-based systems were developed they didn’t think we could possible build the reliable systems in time. They didn’t think that Bayesian decision theory or glass-box agents would necessarily work, and they didn’t know what would. So I think “MIRI intended to build principled glass-box agents based on Bayesian decision theory” is not just misleading, but wrong.