Srivatsan Sampath

Karma: 2

Srivatsan Sampath 4 Sep 2025 15:56 UTC
3 points
0
in reply to: jsd’s comment on: Trust me bro, just one more RL scale up, this one will be the real scale up with the good environments, the actually legit one, trust me bro
- I’d frame the pace of RL environmental progress with a simple 2×2.
  1. Is the task bounded (Codeforces, IMO-style problems) or unbounded (financial analysis using Excel, executive communication using slides, coding in unstructured codebases, design work using Photoshop etc).
  2. Do we have in-house expertise (yes for coding and easy to source for IMO) or not (OpenAI is hiring finance pros this week to help build evals for Financial agents as I am writing this comment). The presence of expertise helps companies build RL environments that better reflect the actual problem space.
- That gives a rough order of progress:
  1. Bounded problem + know-how: o3 preview crushed Codeforces in Dec 2024.
  2. Unbounded problem + know-how: the Codex product line.
  3. Unbounded problem + limited know-how: ChatGPT agents still weak at spreadsheets & terrible at slides today, but I expect that to change in 6 to 12 months.
  Not sure where Bounded problems with little know how (e.g Frontier Math) falls in this though…