Another factor here is there are probably innovations which help with qualitative skill / SDs but which cost much more inference compute. This could make the tradeoffs very different. I’m mostly neglecting this in my analysis.
Another factor is that you can potentially well approximate a faster and smarter system using a mix of a slower and smarter system and a dumber but faster system mixed together. This is part of why I expect improvements on all axes to yield larger gains than a more narrow focus.
Another factor here is there are probably innovations which help with qualitative skill / SDs but which cost much more inference compute. This could make the tradeoffs very different. I’m mostly neglecting this in my analysis.
Another factor is that you can potentially well approximate a faster and smarter system using a mix of a slower and smarter system and a dumber but faster system mixed together. This is part of why I expect improvements on all axes to yield larger gains than a more narrow focus.