What is an example of something useful you think could in theory be done with current models but isn’t being elicited in favor of training larger models?
I think you are may be failing to make an argument along the line of “But people are already working on this! Markets are efficient!”
To which my response is “And thus 3 years from now, we’ll know what to do with models much more than we do, even if you personally can’t come up with an example now. The same way we now know what to do with models much more than 3 years ago.”
Except if you expect this not be the case, like by having directly worked on juicing models or followed people who have done so and failed, you shouldn’t really expect your failure to come up with such examples to be informative.
I meant examples of concrete tasks that current models fail at as-is but you think could be elicited, e.g. with some general scaffolding.
I think you are may be failing to make an argument along the line of “But people are already working on this! Markets are efficient!”
Not quite. Though I am not completely confident, my claim comes from the experience of watching models fail at a bunch of tasks for reasons that seem raw-intelligence related rather than scaffolding or elicitation related. For example, ime current models struggle to answer questions about codebases with nontrivial logic they haven’t seen before or fix difficult bugs.
When you use current models you run into examples where you feel how dumb the model is, how shallow its thinking is, how much it’s relying on heuristics. Scaffolding etc. can only help so much.
Also elicitation techniques for tasks are often not general (e.g. coming up with the perfect prompt with detailed instructions), requiring human labor and intelligence to craft task-specific elicitation methods. This additional effort and information takes away from how much the AI is doing.
Better prompt engineering, fine-tuning, interpretability, scaffolding, sampling.
Fast-forward button
I think you are may be failing to make an argument along the line of “But people are already working on this! Markets are efficient!”
To which my response is “And thus 3 years from now, we’ll know what to do with models much more than we do, even if you personally can’t come up with an example now. The same way we now know what to do with models much more than 3 years ago.”
Except if you expect this not be the case, like by having directly worked on juicing models or followed people who have done so and failed, you shouldn’t really expect your failure to come up with such examples to be informative.
I meant examples of concrete tasks that current models fail at as-is but you think could be elicited, e.g. with some general scaffolding.
Not quite. Though I am not completely confident, my claim comes from the experience of watching models fail at a bunch of tasks for reasons that seem raw-intelligence related rather than scaffolding or elicitation related. For example, ime current models struggle to answer questions about codebases with nontrivial logic they haven’t seen before or fix difficult bugs.
When you use current models you run into examples where you feel how dumb the model is, how shallow its thinking is, how much it’s relying on heuristics. Scaffolding etc. can only help so much.
Also elicitation techniques for tasks are often not general (e.g. coming up with the perfect prompt with detailed instructions), requiring human labor and intelligence to craft task-specific elicitation methods. This additional effort and information takes away from how much the AI is doing.