People would need to basically roll a lot of dice on architectures, in the hopes of hitting upon something that works
How much is RSI going to help here? This is already what everyone does for hyperparameter searches—train another network to do them—an AGI architecture, aka “find me a combination of models that will pass this benchmark” seems like it would be solvable with such a search.
The way I model it, RSI would let GPU rich but more mediocre devs find AGI. They won’t be first unless hypothetically they don’t get the support of the S tier talent, say they are in a different country.
Are you sure there are timelines where “decades” of delay, if open source models exist and GPUs exist in ever increasing and more powerful quantities is really possible?
I expect that sort of brute-force-y approach to take even longer than the “normal” vision-less meandering-around.
Well, I guess it can be a hybrid. The first-to-AGI would be some group that maximizes the product of “has any idea what they’re doing” and “how much compute they have” (rather than either variable in isolation). Meaning:
Compute is a “great equalizer” that can somewhat compensate for lack of focused S-tier talent.
But focused S-tier talent can likewise somewhat compensate for having less compute.
That seems to agree with your model?
And my initial point is that un-focusing the S-tier talent would lengthen the timelines.
Are you sure there are timelines where “decades” of delay, if open source models exist and GPUs exist in ever increasing and more powerful quantities is really possible?
How much is RSI going to help here? This is already what everyone does for hyperparameter searches—train another network to do them—an AGI architecture, aka “find me a combination of models that will pass this benchmark” seems like it would be solvable with such a search.
The way I model it, RSI would let GPU rich but more mediocre devs find AGI. They won’t be first unless hypothetically they don’t get the support of the S tier talent, say they are in a different country.
Are you sure there are timelines where “decades” of delay, if open source models exist and GPUs exist in ever increasing and more powerful quantities is really possible?
I expect that sort of brute-force-y approach to take even longer than the “normal” vision-less meandering-around.
Well, I guess it can be a hybrid. The first-to-AGI would be some group that maximizes the product of “has any idea what they’re doing” and “how much compute they have” (rather than either variable in isolation). Meaning:
Compute is a “great equalizer” that can somewhat compensate for lack of focused S-tier talent.
But focused S-tier talent can likewise somewhat compensate for having less compute.
That seems to agree with your model?
And my initial point is that un-focusing the S-tier talent would lengthen the timelines.
Sure? No, not at all sure.