My theory on why AI isn’t creative is that it lacks a ‘rumination mode’. Ideas can sit and passively connect in our minds for free. This is cool and valuable. LLMs don’t have that luxury. Non-linear, non-goal-driven thinking is expensive and not effective yet.
Yes. See Google’s Co-Scientist project for an example of an AI scaffolded to have a rumination mode. It is claimed to have matched the theory creation of top labs in two areas of science.
So this rumination mode is probably expensive and only claimed to be effective in the one domain it was engineered for. So far. Based on the scaffolded sort-of-evolutionary “algorithm” they used to recombine and test hypotheses against published empirical results, I’d expect that a general version would work almost as well across domains, once somebody puts effort and some inference money into making it work.
This is cool and valuable, as you say. It’s also extremely dangerous, since this lack is one of the few gaps between current LLMs and the general reasoning abilities of humans—without human ethics and human limitations.
Caveat—I haven’t closely checked the credibility of the co-scientist breakthrough story. I think it’s unlikely to be entirely fake or overstated based on the source, but draw your own conclusions.
I’ve primarily thus far taken my conclusions from this podcast interview with the creators and a deep research report based largely on this paper on the co-scientist project.
Looks like Nathan Labenz, the host of that podcast (and an AI expert in his own right) estimates the inference cost for one cutting-edge hypothesis at $100-1000 for one cutting-edge inference based on the literature in this followup episode (which I do not recommend since it’s focused on the actual biological science)
Suppose a model learns “A->B” and “B->C” as separate facts. These get stored in the weights, probably somewhere across the feedforward layers. They can’t be combined unless both facts are loaded into the residual stream/token stream at the same time, which might not happen. And even if that is the case, the model won’t remember “A->C” as a standalone fact in the future, it has to re-compute it every time.
Sure. But more than the immediate, associative leaps, I think I’m interested in their ability to sample concepts across very different domains and find connections whether that is done deliberately or randomly. Though with humans, the ideas that plague our subconscious are tied to our persistent, internal questions.
My theory on why AI isn’t creative is that it lacks a ‘rumination mode’. Ideas can sit and passively connect in our minds for free. This is cool and valuable. LLMs don’t have that luxury. Non-linear, non-goal-driven thinking is expensive and not effective yet.
Cross-posted from X
Yes. See Google’s Co-Scientist project for an example of an AI scaffolded to have a rumination mode. It is claimed to have matched the theory creation of top labs in two areas of science.
So this rumination mode is probably expensive and only claimed to be effective in the one domain it was engineered for. So far. Based on the scaffolded sort-of-evolutionary “algorithm” they used to recombine and test hypotheses against published empirical results, I’d expect that a general version would work almost as well across domains, once somebody puts effort and some inference money into making it work.
This is cool and valuable, as you say. It’s also extremely dangerous, since this lack is one of the few gaps between current LLMs and the general reasoning abilities of humans—without human ethics and human limitations.
Caveat—I haven’t closely checked the credibility of the co-scientist breakthrough story. I think it’s unlikely to be entirely fake or overstated based on the source, but draw your own conclusions.
I’ve primarily thus far taken my conclusions from this podcast interview with the creators and a deep research report based largely on this paper on the co-scientist project.
Looks like Nathan Labenz, the host of that podcast (and an AI expert in his own right) estimates the inference cost for one cutting-edge hypothesis at $100-1000 for one cutting-edge inference based on the literature in this followup episode (which I do not recommend since it’s focused on the actual biological science)
Do you mean something like:
Suppose a model learns “A->B” and “B->C” as separate facts. These get stored in the weights, probably somewhere across the feedforward layers. They can’t be combined unless both facts are loaded into the residual stream/token stream at the same time, which might not happen. And even if that is the case, the model won’t remember “A->C” as a standalone fact in the future, it has to re-compute it every time.
Sure. But more than the immediate, associative leaps, I think I’m interested in their ability to sample concepts across very different domains and find connections whether that is done deliberately or randomly. Though with humans, the ideas that plague our subconscious are tied to our persistent, internal questions.
Gwern’s made some suggestions along similar lines.