Then again, even humans aren’t super reliable at making big insights, and it’s not trivial even for very top-tier humans to actually be both creative and have significant impact in the world.
To be clear, I think GPT-n is worse than humans in this regard, but it’s not generally good practice to compare humans and AIs by trying to show that an AI or human can or can’t do something at all, and in general the treatment of capabilities as very discrete such that either an AI does or doesn’t have the capability at all has done harm to AI discourse.
There’s some reason to think around thresholds, like for example long tails requiring high reliability, but in general I’m much more skeptical of the need to attribute a deep reason/cause for why current AI might fail to automate AI research/take over the world, and think if LLMs stall out, a lot of the reason will be pretty prosaic.
I am quite surprised that this happened 3 years ago! This seems really impressive for 3 years ago GPT series? And I expect the models to get better? Yes, it might be a fluke, but wouldn’t we expect current models to have a higher chance of doing a fluke this good?
That was almost 3 years ago.
If there’s not a better example by now it was probably a fluke.
Then again, even humans aren’t super reliable at making big insights, and it’s not trivial even for very top-tier humans to actually be both creative and have significant impact in the world.
To be clear, I think GPT-n is worse than humans in this regard, but it’s not generally good practice to compare humans and AIs by trying to show that an AI or human can or can’t do something at all, and in general the treatment of capabilities as very discrete such that either an AI does or doesn’t have the capability at all has done harm to AI discourse.
There’s some reason to think around thresholds, like for example long tails requiring high reliability, but in general I’m much more skeptical of the need to attribute a deep reason/cause for why current AI might fail to automate AI research/take over the world, and think if LLMs stall out, a lot of the reason will be pretty prosaic.
Link below:
https://www.lesswrong.com/posts/Nbcs5Fe2cxQuzje4K/value-of-the-long-tail
No, humans do this all the time, constantly, originarily (https://www.lesswrong.com/posts/5tqFT3bcTekvico4d/do-confident-short-timelines-make-sense#Creativity___Originariness) when they are kids. They keep using roughly the same set of faculties on harder and harder problems, including sometimes making globally novel insights. Gippities learn in a different way which does not go on to do that. You can be helped in noticing that it’s a different way via sample complexity.
I am quite surprised that this happened 3 years ago! This seems really impressive for 3 years ago GPT series? And I expect the models to get better? Yes, it might be a fluke, but wouldn’t we expect current models to have a higher chance of doing a fluke this good?
Then why isn’t there a better example from a year ago?