Daniel Kokotajlo comments on Truthful LMs as a warm-up for aligned AGI

Daniel Kokotajlo 20 Jan 2022 17:40 UTC
LW: 3 AF: 3
AF
Nothing to apologize for, it was reasonably clear, I’m just trying to learn more about what you believe and why. This has been helpful, thanks!
I totally agree that in fast takeoff scenarios we are less likely to spot those things until it’s too late. I guess I agree that truthful LM work is less likely to scale gracefully to AGI in fast takeoff scenarios… so I guess I agree with your overall point… I just notice I feel a bit confused and muddle about it, is all. I can imagine plausible slow-takeoff scenarios in which truthful LM work doesn’t scale gracefully, and plausible fast-takeoff scenarios in which it does. At least, I think I can. The former scenario would be something like: It turns out the techniques we develop for making dumb AIs truthful stop working once the AIs get smart, for similar reasons that techniques we use to make small children be honest (or to put it more vividly, believe in santa) stop working once they grow up. The latter scenario would be something like: Actually that’s not the case, the techniques work all the way up past human level intelligence, and “fast takeoff” in practice means “throttled takeoff” where the leading AI project knows they have a few month lead over everyone else and is using those months to do some sort of iterated distillation and amplification, in which it’s crucial that the early stages be truthful and that the techniques scale to stage N overseeing stage N+1.