What Separates an Optimizer From Something We Merely Describe as Optimizing?
Reading through the AI Control posts, I notice that much if the discussion assumes a distinction between a system executing a policy and a system pursuing an outcome despite constraints.
Is there a generally accepted threshold where LessWrong would say a system has crossed from “optimization as description” into optimization as agent”?
Or is the distinction itself considered observer-relative?
Being able to pin this down exactly is kind of an open research question. (i.e. “what is an agent?” and “what is an optimizer”). But, roughly, things are “more optimizer-like” the more they successfully converge on a target no matter what starting conditions you put them in, and no matter what obstacles you throw in their way.
Some previous posts:
Bottle Caps Aren’t Optimisers
The ground of optimization
LLM base models in their raw form are less “optimizer-y” because, while clearly intelligent, if you change their initial prompt they will end up doing radically different things instead of converging to the same thing. (compared to AlphaGo which always tries to win games of go).
That’s helpful, thank you.
One thing I find interesting is that many of these distinctions seem to become fuzzier the closer we get to systems that are capable of adapting across contexts. It starts to feel less like a binary question (“optimizer” or “not optimizer”) and more like a question of degree.
I’ll take a look at those posts.