Yeah, but, the reason I have my current combination of beliefs/confusions is that LLMs are demonstrably able to execute on each individual mental move I’d expect a smart person to do (and to chain 2-3 of such actions reasonably).
The thing that is missing is them being able to keep applying them when situationally appropriate. It feels like scaffolding should be able to solve this. I’ve heard some things about Claude Bureaucracy setups that do pieces of this, where a watchdog is checking “does the main agent seem to be making [various kinds of mistakes]” or “is this a good time to do [specific thing].”
LLMs are demonstrably able to execute on each individual mental move I’d expect a smart person to do
For the record, I think this is bigtime streetlighting. There’s a bunch of mental moves, broadly construed. Then there’s a subset which are reflected in what you notice when you watch LLMs or humans doing stuff. I think it’s a pretty “small” subset, in the sense that you’re looking at a “surface”, or you’re looking at products rather than manufacturing processes so to speak (consider the complexity of a toaster vs. the complexity of the transitive closure of a toaster under the operation ”...and also the technological concepts needed to create that”). I think you can tell it’s small by
Thinking about the role of language (what it is and isn’t for, within Thinking)
Noticing that when you zoom in on your thoughts, most of the important stuff happens in little leaps that are opaque to you
An LLM can do X successfully when prompted to, but doesn’t know it should do X without any prompting cues (metacognitive gap), doesn’t know how to tell when X is appropriate over long contexts (metacognitive gap, distribution mismatch), and doesn’t know how to handle long contexts well in general.
Things like “set a watchdog LLM to check on the main LLM” show that the LLM itself isn’t fundamentally incapable. If slim scaffolding externalizing the desired behavior gets the same model to do it, you could just distill to fold the scaffolding-induced behavior into the model itself. It has the capacity to learn to do it. It just doesn’t do it by default.
Yes, and my point is “I would have expected this to work with scaffolding, and, therefore, I also expect there to be ways of getting the same behavior out of the core AI model.” (I didn’t say the second part before, it’s more like ‘it’s not really a crux for me whether you solve this via scaffolding or training, the raw ingredients seem to be there.’)
BUT, the fact that we’re still seeing some problems with this is somewhat surprising to me, and I’m not sure if the update is more like “my underlying model here is wrong” or more like “well it takes more sweat and elbow grease and scaling to leverage the existing capabilities in a generalized way.”
You “would have expected” that and you would be wrong.
Doesn’t matter if it’s just 32 bits worth of connections—it’s 32 bits worth of connections that aren’t currently present in the model. Nothing fundamental stops them from being present. It’s just that no one burned them in with training, so they aren’t there.
Finding all the ways to generalize from improved prompts and scaffolding into improved models isn’t at all trivial. And people aren’t at the point of “search and refinement at scale” yet.
Downvoted because your string of comments here feel weirdly aggro, and don’t feel like they are really engaging with what I’m saying. (Like, yes, I am saying I am confused and presumably wrong about something. You seem to want to go hard on saying “yes you’re wrong!” and I’m like “yes, I know?”)
Yeah, but, the reason I have my current combination of beliefs/confusions is that LLMs are demonstrably able to execute on each individual mental move I’d expect a smart person to do (and to chain 2-3 of such actions reasonably).
The thing that is missing is them being able to keep applying them when situationally appropriate. It feels like scaffolding should be able to solve this. I’ve heard some things about Claude Bureaucracy setups that do pieces of this, where a watchdog is checking “does the main agent seem to be making [various kinds of mistakes]” or “is this a good time to do [specific thing].”
For the record, I think this is bigtime streetlighting. There’s a bunch of mental moves, broadly construed. Then there’s a subset which are reflected in what you notice when you watch LLMs or humans doing stuff. I think it’s a pretty “small” subset, in the sense that you’re looking at a “surface”, or you’re looking at products rather than manufacturing processes so to speak (consider the complexity of a toaster vs. the complexity of the transitive closure of a toaster under the operation ”...and also the technological concepts needed to create that”). I think you can tell it’s small by
Thinking about the role of language (what it is and isn’t for, within Thinking)
Noticing that when you zoom in on your thoughts, most of the important stuff happens in little leaps that are opaque to you
(Various things about LLMs not being creative)
Like I said: distribution mismatch.
An LLM can do X successfully when prompted to, but doesn’t know it should do X without any prompting cues (metacognitive gap), doesn’t know how to tell when X is appropriate over long contexts (metacognitive gap, distribution mismatch), and doesn’t know how to handle long contexts well in general.
Things like “set a watchdog LLM to check on the main LLM” show that the LLM itself isn’t fundamentally incapable. If slim scaffolding externalizing the desired behavior gets the same model to do it, you could just distill to fold the scaffolding-induced behavior into the model itself. It has the capacity to learn to do it. It just doesn’t do it by default.
Yes, and my point is “I would have expected this to work with scaffolding, and, therefore, I also expect there to be ways of getting the same behavior out of the core AI model.” (I didn’t say the second part before, it’s more like ‘it’s not really a crux for me whether you solve this via scaffolding or training, the raw ingredients seem to be there.’)
BUT, the fact that we’re still seeing some problems with this is somewhat surprising to me, and I’m not sure if the update is more like “my underlying model here is wrong” or more like “well it takes more sweat and elbow grease and scaling to leverage the existing capabilities in a generalized way.”
You “would have expected” that and you would be wrong.
Doesn’t matter if it’s just 32 bits worth of connections—it’s 32 bits worth of connections that aren’t currently present in the model. Nothing fundamental stops them from being present. It’s just that no one burned them in with training, so they aren’t there.
Finding all the ways to generalize from improved prompts and scaffolding into improved models isn’t at all trivial. And people aren’t at the point of “search and refinement at scale” yet.
Downvoted because your string of comments here feel weirdly aggro, and don’t feel like they are really engaging with what I’m saying. (Like, yes, I am saying I am confused and presumably wrong about something. You seem to want to go hard on saying “yes you’re wrong!” and I’m like “yes, I know?”)