Are your long-running agents self-improving in loops with minimal prompting? Mine sure are!
I think we’re seeing the first sparks of RSI here, folks. I’m expecting the frontier labs to scramble furiously to push this forward, finding and patching the meta-failure-modes. Thus, I expect next versions to be even better at this.
Here’s what some other people are saying/claiming:
And many more. This is just a few examples.
Not super impressive so far, but if this “task” goes the way many others have of first showing signs of progress in the 1-3% accuracy range, then rapidly shooting upwards over the next couple of model versions.… Yeah.
Basically, I think we’re in crunch time. Automated alignment time is here. Get cracking.
Sparks of RSI?
Are your long-running agents self-improving in loops with minimal prompting? Mine sure are!
I think we’re seeing the first sparks of RSI here, folks. I’m expecting the frontier labs to scramble furiously to push this forward, finding and patching the meta-failure-modes. Thus, I expect next versions to be even better at this.
Here’s what some other people are saying/claiming:
https://x.com/shreyasnsharma/status/2032567729560105117
https://x.com/varun_mathur/status/2032671842230501729
https://x.com/TuXinming/status/2032478765033701835
https://x.com/andrewwhite01/status/2031761577943425475
https://x.com/aramh/status/2029553870502756706
https://x.com/polynoamial/status/2029622090152956335
https://t.co/znsJlcww5r
And many more. This is just a few examples. Not super impressive so far, but if this “task” goes the way many others have of first showing signs of progress in the 1-3% accuracy range, then rapidly shooting upwards over the next couple of model versions.… Yeah.
Basically, I think we’re in crunch time. Automated alignment time is here. Get cracking.