Top lesson from GPT: we will probably destroy humanity “for the lulz” as soon as we are able.

Forget complicated “sharp left turn” schemes, nefarious nanobots, lists of lethalities, out-of-distribution actions, failed AI boxing. As Zvi pointed out in multiple posts, like this one, if humans get unrestricted access to a powerful enough tool, it is all over. People will intentionally twist even the most aligned Tool AI into an Agent of Death, long before it becomes superintelligent and is able to resist. You can find examples of it online.

In that sense, Eliezer was wrong in the worst possible way: we have a lot less time to get our act together, because capabilities advance faster than intelligence and humans are very inventive at finding ways to misuse these capabilities. We will push these capabilities in the “gain of function” direction mercilessly and without regard for safety or anything else. We are worse than toddlers playing with matches. True, like with toddlers, our curiosity far outstrips our sense of self-preservation, probably because our brains are not wired to be afraid of something that is not like a snake or a spider or a steep cliff. But it is worse than that. People will try to do the worst thing imaginable because they do not “alieve” potential harm, even if they can track it logically, unlike a toddler.

I guess the silver lining is that we have a bit of time to iterate. The AI tools are not yet at the level of causing widespread destruction, and probably will not be for some time. It does not mean that if and when some superintelligence emerges we will be well prepared, but if humanity survives until then without self-annihilation, we might have a better chance, compared to the “one shot at getting it right” before we are all wiped out, as Eliezer emphasized. It might not be an “alignment manual from the surviving future”, but at least some wisdom and discipline of avoiding the early pitfalls, and if we die, we might die with “more dignity”. The odds are not great, but maybe they are there.

Edit: quanticle pointed out that Bostrom predicted it in the paper The Vulnerable World Hypothesis: