Yes. RL will at least be more applicable to well-defined tasks. Some intuitions:
In my everyday, the gap between well-defined task ability and working with the METR codebase is growing
4 month doubling time is faster than the rate of progress in most other realistic or unrealistic domains
Recent models really like to reward hack, suggesting that RL can cause some behaviors not relevant to realistic tasks
This trend will break at some point, eg when labs get better at applying RL to realistic tasks, or when RL hits diminishing returns, but I have no idea when
Yes. RL will at least be more applicable to well-defined tasks. Some intuitions:
In my everyday, the gap between well-defined task ability and working with the METR codebase is growing
4 month doubling time is faster than the rate of progress in most other realistic or unrealistic domains
Recent models really like to reward hack, suggesting that RL can cause some behaviors not relevant to realistic tasks
This trend will break at some point, eg when labs get better at applying RL to realistic tasks, or when RL hits diminishing returns, but I have no idea when