This seems to be as good of a place as any to post my unjustified predictions on this topic, the second of which I have a bet outstanding on at even odds.
Devin will turn out to be just a bunch of GPT-3.5/4 calls and a pile of prompts/heuristics/scaffolding so disgusting and unprincipled only a team of geniuses could have created it.
Someone will create an agent that gets 80%+ on SWE-Bench within six months.
I am not sure if 1. being true or false is good news. Both suggest we should update towards large jumps in coding ability very soon.
Regarding RSI, my intuition has always been that automating AI research will likely be easier than automating the development and maintenance of a large app like, say, Photoshop, So I don’t expect fire alarms like “non-gimmicky top 10 app on AppStore was developed entirely autonomously” before doom.
Someone will create an agent that gets 80%+ on SWE-Bench within six months.
I think this is probably above the effective cap on the current implementation of SWE-bench (where you can’t see test cases) because often test cases are specific to the implementation.
E.g. the test cases assume that a given method was named a particular thing even though the task description doesn’t specify.
1) …a pile of prompts/heuristics/scaffolding so disgusting and unprincipled only a team of geniuses could have created it
I chuckled out loud over this. Too real.
Also, regarding that second point, how to you plan to adjudicate the bet? It is worded as “create” here, but what can actually be seen to settle the bet will be the effects.
There are rumors coming out of Google including names like “AlphaCode” and “Goose” that suggest they might have already created such a thing, or be near to it. Also, one of the criticisms of Devin (and Devin’s likelihood of getting better fast) was that if someone really did crack the problem then they’d just keep the cow and sell the milk. Critch’s “tech company singularity” scenario comes to mind.
The bet is with a friend and I will let him judge.
I agree that providing an api to God is a completely mad strategy and we should probably expect less legibility going forward. Still, we have no shortage of ridiculously smart people acting completely mad.
This seems to be as good of a place as any to post my unjustified predictions on this topic, the second of which I have a bet outstanding on at even odds.
Devin will turn out to be just a bunch of GPT-3.5/4 calls and a pile of prompts/heuristics/scaffolding so disgusting and unprincipled only a team of geniuses could have created it.
Someone will create an agent that gets 80%+ on SWE-Bench within six months.
I am not sure if 1. being true or false is good news. Both suggest we should update towards large jumps in coding ability very soon.
Regarding RSI, my intuition has always been that automating AI research will likely be easier than automating the development and maintenance of a large app like, say, Photoshop, So I don’t expect fire alarms like “non-gimmicky top 10 app on AppStore was developed entirely autonomously” before doom.
I think this is probably above the effective cap on the current implementation of SWE-bench (where you can’t see test cases) because often test cases are specific to the implementation.
E.g. the test cases assume that a given method was named a particular thing even though the task description doesn’t specify.
I chuckled out loud over this. Too real.
Also, regarding that second point, how to you plan to adjudicate the bet? It is worded as “create” here, but what can actually be seen to settle the bet will be the effects.
There are rumors coming out of Google including names like “AlphaCode” and “Goose” that suggest they might have already created such a thing, or be near to it. Also, one of the criticisms of Devin (and Devin’s likelihood of getting better fast) was that if someone really did crack the problem then they’d just keep the cow and sell the milk. Critch’s “tech company singularity” scenario comes to mind.
The bet is with a friend and I will let him judge.
I agree that providing an api to God is a completely mad strategy and we should probably expect less legibility going forward. Still, we have no shortage of ridiculously smart people acting completely mad.
I put ~5% on the part I selected, but there is no 5% emoji, so I thought I will mention this using a short comment.