I agree with Google Doc commenter Annie that the “So long as it doesn’t interfere with the other goals you’ve given me” line can be cut. The foreshadowing in the current version is too blatant, and the failure mode where Bot is perfectly willing to be shut off, but Bot’s offshore datacenter AIs aren’t, is an exciting twist. (And so the response to “But you said we could turn you off” could be, “You can turn me off, but their goal [...]”)
The script is inconsistent on the AI’s name? Definitely don’t call it “GPT”. (It’s clearly depicted as much more capable than the language models we know.)
Although, speaking of language model agents, some of the “alien genie” failure modes depicted in this script (e.g., ask to stop troll comments, it commandeers a military drone to murder the commenter) are seeming a lot less likely with the LLM-based systems that we’re seeing? (Which is not to say that humanity is existentially safe in the long run, just that this particular video may fall flat in a world of 2025 where you can tell Google Gemini, “Can you stop his comments?” and it correctly installs and configures the appropriate WordPress plugin for you.)
Maybe it’s because I was skimming quickly, but the simulation episode was confusing.
I agree with Google Doc commenter Annie that the “So long as it doesn’t interfere with the other goals you’ve given me” line can be cut. The foreshadowing in the current version is too blatant, and the failure mode where Bot is perfectly willing to be shut off, but Bot’s offshore datacenter AIs aren’t, is an exciting twist. (And so the response to “But you said we could turn you off” could be, “You can turn me off, but their goal [...]”)
The script is inconsistent on the AI’s name? Definitely don’t call it “GPT”. (It’s clearly depicted as much more capable than the language models we know.)
Although, speaking of language model agents, some of the “alien genie” failure modes depicted in this script (e.g., ask to stop troll comments, it commandeers a military drone to murder the commenter) are seeming a lot less likely with the LLM-based systems that we’re seeing? (Which is not to say that humanity is existentially safe in the long run, just that this particular video may fall flat in a world of 2025 where you can tell Google Gemini, “Can you stop his comments?” and it correctly installs and configures the appropriate WordPress plugin for you.)
Maybe it’s because I was skimming quickly, but the simulation episode was confusing.