I’d frame the pace of RL environmental progress with a simple 2×2.
Is the task bounded (Codeforces, IMO-style problems) or unbounded (financial analysis using Excel, executive communication using slides, coding in unstructured codebases, design work using Photoshop etc).
Do we have in-house expertise (yes for coding and easy to source for IMO) or not (OpenAI is hiring finance pros this week to help build evals for Financial agents as I am writing this comment). The presence of expertise helps companies build RL environments that better reflect the actual problem space.
That gives a rough order of progress:
Bounded problem + know-how: o3 preview crushed Codeforces in Dec 2024.
Unbounded problem + know-how: the Codex product line.
Unbounded problem + limited know-how: ChatGPT agents still weak at spreadsheets & terrible at slides today, but I expect that to change in 6 to 12 months.
Not sure where Bounded problems with little know how (e.g Frontier Math) falls in this though…
I’d frame the pace of RL environmental progress with a simple 2×2.
Is the task bounded (Codeforces, IMO-style problems) or unbounded (financial analysis using Excel, executive communication using slides, coding in unstructured codebases, design work using Photoshop etc).
Do we have in-house expertise (yes for coding and easy to source for IMO) or not (OpenAI is hiring finance pros this week to help build evals for Financial agents as I am writing this comment). The presence of expertise helps companies build RL environments that better reflect the actual problem space.
That gives a rough order of progress:
Bounded problem + know-how: o3 preview crushed Codeforces in Dec 2024.
Unbounded problem + know-how: the Codex product line.
Unbounded problem + limited know-how: ChatGPT agents still weak at spreadsheets & terrible at slides today, but I expect that to change in 6 to 12 months.
Not sure where Bounded problems with little know how (e.g Frontier Math) falls in this though…