Nina Panickssery comments on We are likely in an AI overhang, and this is bad.

Nina Panickssery 24 Sep 2025 3:13 UTC
3 points
1
Better prompt engineering, fine-tuning, interpretability, scaffolding, sampling.
I meant examples of concrete tasks that current models fail at as-is but you think could be elicited, e.g. with some general scaffolding.
I think you are may be failing to make an argument along the line of “But people are already working on this! Markets are efficient!”
Not quite. Though I am not completely confident, my claim comes from the experience of watching models fail at a bunch of tasks for reasons that seem raw-intelligence related rather than scaffolding or elicitation related. For example, ime current models struggle to answer questions about codebases with nontrivial logic they haven’t seen before or fix difficult bugs.
When you use current models you run into examples where you feel how dumb the model is, how shallow its thinking is, how much it’s relying on heuristics. Scaffolding etc. can only help so much.
Also elicitation techniques for tasks are often not general (e.g. coming up with the perfect prompt with detailed instructions), requiring human labor and intelligence to craft task-specific elicitation methods. This additional effort and information takes away from how much the AI is doing.