Man the thing that is so wuckles about LLMs and their lack of agency is… like, I get why “creativity” is hard. But relentlessness is… so brute forceable? Just… just fucking keep doing stuff, man.
I guess this actually does work when you spend enough compute and you can solve IMO gold problems.
Also I guess I recall the real bottleneck here is LLM discernment, which, does make sense as a fundamental bottleneck.
I agree that discernment is necessary (so maybe expand to RCRD?).
This lens is pretty clarifying I think. That’s relative to repeatedly pointing out that “agency” in the sense of just relentlessly pursuing a goal is trivially easy to add via scaffolding, so not the missing piece many people think it is, and pointing out that LLMs are creative as hell. They might need a little prompting to get creative enough, but again that’s trivial.
Hm, what about “relentless creative refinement” since I’m not sure what resourcefulness directly points at?
Anyway, discernment does seem like the limiting factor. You’ve got to discern which of your relentlessly creative efforts are most worth further pursuit. I think discernment is a somewhat better term than the others I’ve seen used for this missing capability. Getting the right term seems worthwhile.
The following is just some of my follow-on thoughts on the path to discernment in agentic LLMs and therefore timelines. Perhaps this will be fodder for a future post. It’s pretty divergent from the original topic so feel free to ignore.
Thinking about how humans acquire discernment in a given area should give some clues as to how hard it would be to add that to agentic LLMs.
Humans do discernment (IMO) sometimes with a bunch of very complex System 2 explicit analysis of a situation to get a decent guess at whether this approach is good/working. Over enough examples/experiences we can learn/compile those many judgments into effortless and mysterious intuitive judgments (I guess more how “discernment” is usually used”). Or we get enough data/experiences to learn/compile by using some faster rubric, like “I think those pants are fashionable because something else that person is wearing seems probably fashionable.”
It’s a bunch of online learning specific to a situation OR careful analysis following strategies and formulas that are maybe less situation-specific and more general, but quite time-consuming. For instance, Google’s co-scientist project has a highly complex scaffolding to create, evolve, and evaluate scientific hypotheses, including discerning their worth against the literature and in other ways. And it seems to work. That system doesn’t have the continuous learning to compile that into better judgments. It’s unclear how far you could get by fine-tuning on results of those laborious judgments in a given domain.
The other approach would be to create datasets that include much more/better value judgments than text corpora usually do. I don’t know how easy/hard that would be to create.
To me this suggests it’s not trivial to add discernment, but also doesn’t require breakthroughs to add some, leaving the question how much discernment is enough for any given purpose.
Man the thing that is so wuckles about LLMs and their lack of agency is… like, I get why “creativity” is hard. But relentlessness is… so brute forceable? Just… just fucking keep doing stuff, man.
I guess this actually does work when you spend enough compute and you can solve IMO gold problems.
Also I guess I recall the real bottleneck here is LLM discernment, which, does make sense as a fundamental bottleneck.
that’s the easy part of relentlessness. LMs already often get stuck in loops of trying increasingly hopeless things while getting utterly stuck.
Nod, seems like that falls under “actually the bottleneck is discernment”? (albeit “discernment” is vague could be decomposed further)
I agree that discernment is necessary (so maybe expand to RCRD?).
This lens is pretty clarifying I think. That’s relative to repeatedly pointing out that “agency” in the sense of just relentlessly pursuing a goal is trivially easy to add via scaffolding, so not the missing piece many people think it is, and pointing out that LLMs are creative as hell. They might need a little prompting to get creative enough, but again that’s trivial.
Hm, what about “relentless creative refinement” since I’m not sure what resourcefulness directly points at?
Anyway, discernment does seem like the limiting factor. You’ve got to discern which of your relentlessly creative efforts are most worth further pursuit. I think discernment is a somewhat better term than the others I’ve seen used for this missing capability. Getting the right term seems worthwhile.
The following is just some of my follow-on thoughts on the path to discernment in agentic LLMs and therefore timelines. Perhaps this will be fodder for a future post. It’s pretty divergent from the original topic so feel free to ignore.
Thinking about how humans acquire discernment in a given area should give some clues as to how hard it would be to add that to agentic LLMs.
Humans do discernment (IMO) sometimes with a bunch of very complex System 2 explicit analysis of a situation to get a decent guess at whether this approach is good/working. Over enough examples/experiences we can learn/compile those many judgments into effortless and mysterious intuitive judgments (I guess more how “discernment” is usually used”). Or we get enough data/experiences to learn/compile by using some faster rubric, like “I think those pants are fashionable because something else that person is wearing seems probably fashionable.”
It’s a bunch of online learning specific to a situation OR careful analysis following strategies and formulas that are maybe less situation-specific and more general, but quite time-consuming. For instance, Google’s co-scientist project has a highly complex scaffolding to create, evolve, and evaluate scientific hypotheses, including discerning their worth against the literature and in other ways. And it seems to work. That system doesn’t have the continuous learning to compile that into better judgments. It’s unclear how far you could get by fine-tuning on results of those laborious judgments in a given domain.
The other approach would be to create datasets that include much more/better value judgments than text corpora usually do. I don’t know how easy/hard that would be to create.
To me this suggests it’s not trivial to add discernment, but also doesn’t require breakthroughs to add some, leaving the question how much discernment is enough for any given purpose.