I think we also care about how fast it gets arbitrarily capable. Consider a system which finds an approach which can measure approximate actions-in-the-world-Elo (where an entity with an advantage of 200 on their actions-in-the-world-Elo score will choose a better action 76% of the time), but it’s using a “mutate and test” method over an exponentially large space, such that the time taken to find the next 100 point gain takes 5x as long, and it starts out with an actions-in-the-world-Elo 1000 points lower than an average human with a 1 week time-to-next-improvement. That hypothetical system is technically a recursively self-improving intelligence that will eventually reach any point of capability, but it’s not really one we need to worry that much about unless it finds techniques to dramatically reduce the search space.
Like I suspect that GPT-4 is not actually very far from the ability to come up with a fine-tuning strategy for any task you care to give it, and to create a simple directory of fine-tuned models, and to create a prompt which describes to it how to use that directory of fine-tuned models. But fine-tuning seems to take an exponential increase in data for each linear increase in performance, so that’s still not a terribly threatening “AGI”.
Let’s say we have a language model that only knows how to speak English and a second one that only knows how to speak Japanese. Is your expectation that there would be no way to glue these two LLMs together to build an English-to-Japanese translator such that training the “glue” takes <1% of the compute used to train the independent models?
I weakly expect the opposite, largely based on stuff like this, and based on playing around with using algebraic value editing to get an LLM to output French in response to English (but also note that the LLM I did that with knew English and the general shape of what French looks like, so there’s no guarantee that result scales or would transfer the way I’m imagining).