Bogdan Ionut Cirstea comments on Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

Bogdan Ionut Cirstea 28 Jul 2025 16:59 UTC
4 points
2
at least delay it for a while
Notably, even just delaying it until we can (safely) automate large parts of AI safety research would be both a very big deal, and intuitively seems quite tractable to me. E.g. the task-time-horizons required seem to be (only) ~100 hours for a lot of prosaic AI safety research:
https://x.com/BogdanIonutCir2/status/1948152133674811518
What links here?
- Bogdan Ionut Cirstea's comment on Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety by Tomek Korbak (28 Jul 2025 18:23 UTC; 3 points)
- Bogdan Ionut Cirstea 29 Jul 2025 15:34 UTC
  2 points
  0
  Parent
  Based on current trends fron https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/, this could already have happened by sometime between 2027 and 2030:
  - Bogdan Ionut Cirstea 2 Aug 2025 12:03 UTC
    2 points
    0
    Parent
    And this suggests 100x acceleration in research cycles if ideation + implementation were automated, and humans were relegated to doing peer reviewing of AI-published papers:
    https://x.com/BogdanIonutCir2/status/1940100507932197217