In their most straightforward form (“foundation models”), language models are a technology which naturally scales to something in the vicinity of human-level (because it’s about emulating human outputs), not one that naturally shoots way past human-level performance
For a more detailed analysis of how this problem could be overcome but why doing so is unlikely to be a fast process, see my post LLMs May Find it Hard to FOOM. (Later parts of your post have some overlap with that, but there are some specifics such as conditioning and extrapolation that you don’t discuss, so readers with find some more useful content there.)
For a more detailed analysis of how this problem could be overcome but why doing so is unlikely to be a fast process, see my post LLMs May Find it Hard to FOOM. (Later parts of your post have some overlap with that, but there are some specifics such as conditioning and extrapolation that you don’t discuss, so readers with find some more useful content there.)