The report is partially optimistic but the results seem unambiguously bearish.
Like, yeah, maybe some of these problems could be solved with scaffolding—but the first round of scaffolding failed, and if you’re going to spend a lot of time iterating on scaffolding, you could probably instead write a decent bot that doesn’t use Claude in that time. And then you wouldn’t be vulnerable to bizarre hallucinations, which seem like an unacceptable risk.
The report is partially optimistic but the results seem unambiguously bearish.
Like, yeah, maybe some of these problems could be solved with scaffolding—but the first round of scaffolding failed, and if you’re going to spend a lot of time iterating on scaffolding, you could probably instead write a decent bot that doesn’t use Claude in that time. And then you wouldn’t be vulnerable to bizarre hallucinations, which seem like an unacceptable risk.