Seth Herd comments on Scaffolded LLMs: Less Obvious Concerns

Seth Herd 21 Jun 2023 18:53 UTC
2 points
0
I’ll show you that draft when it’s ready; thanks for the offer!

A couple of thoughts:

At this point I’m torn between optimism based on the better interpretability and pessimism based on the multipolar scenario. The timeline doesn’t bother me that much, since I don’t think more general alignment work would help much in aligning those specific systems if they make it to AGI.and of course I’d like a longer timeline for me and others to keep enjoying life. My optimism is relative, and I still have something like a vague 50% chance of failure.

Shorter timelines have an interesting advantage of avoiding compute and algorithm overhangs that create fast, discontinuous progress. This new post makes the case in detail. I’m not at all sure this advantage outweighs the loss of time to work on alignment, since that’s certainly helpful.

https://www.lesswrong.com/posts/YkwiBmHE3ss7FNe35/short-timelines-and-slow-continuous-takeoff-as-the-safest

So I’m entirely unsure whether I wish no one had thought of this. But in retrospect it seems like too obvious an idea to miss. The fact that almost everyone in the alignment community (including me) was blindsided by it seems like a warning sign that we need to work harder to predict new technologies and not fight the last war. One interesting factor is that many of us who saw this or had vague thoughts in this direction never mentioned it publicly, to avoid helping progress; but the hope that no one would think of such an obvious idea pretty quickly was in retrospect totally unreasonable.