Why I’m in AI sequence: 2020 Journal entry about gpt3
I moved from math academia to full-time AI safety a year ago—in this I’m in the same boat as Adam Shai, whose reflection post on the topic I recommend you read instead of this.
In making the decision, I went through a lot of thinking and (attempts at) learning about AI before that. A lot of my thinking had been about whether a pure math academic can make a positive difference in AI, and examples that I thought counterindicated this—I finally decided this might be a good idea after talking to my sister Lizka extensively and doing MATS in Summer of 2023. I’m thinking of doing a more detailed post about my decision and thinking later, in case there are other academics thinking about making this transition (and feel free to reach out in pm’s in this case!).
But one thing I have started to forget is how scary and visceral AI risk felt when I was making the decision. I’m both glad and a little sad that the urgency is less visceral and more theoretical now. AI is “a part of the world”, not an alien feature: part of the “setting” in the Venkat Rao post that was part of my internal lexicon at the time.
For now, in order to fill a gap in my constantly flagging daily writing schedule, I’ll share a meandering entry from 2020 about how I thought about positive AI futures. I don’t endorse a lot of it; much is simplistic and low-context, or alternatively commonplace in these circles, though some of it holds up. It’s interesting reading back that the thing I thought was most interesting as a first attempt at orienting my thinking was fleshing out “positive futures” and what they might entail. Two big directional updates I’ve had since are thinking harder about “human alignment” and “human takeover”, and trying to temper the predictions that assume singularitarian “first-past-the-post” AGI for a messier “AI-is-kinda-AGI” world that we will likely end up in.
journal entry
7/19/2020 [...] I’m also being paranoid about GPT-3.
Let’s think. Will the world end, and if so, when? No one knows, obviously. GPT-3 is a good text generation bot. It can figure out a lot about semantics, mood, style, even a little about humor. It’s probably not going to take over the world yet. But how far away are we from AGI?
GPT-3 makes me think, “less than a decade”. There’s a possibility it will be soon (within the year). I’d assign that probability 10%. It felt like 20% when I first saw its text, but seeing Sam Altman’s remark and thinking a little harder, I don’t think it’s quite realistic for it to go AGI without a significant extra step or two. I think that I’d give it order of 50% within the decade. So it’s a little like living with a potentially fatal disease, with a prognosis of 10 years. Now we have no idea what AGI will be like. It will most likely either be very weird and deadly or revolutionary and good, though disappointing in some ways. I think there’s not much we can do about the weird and deadly scenarios. Humans have lived in sociopathic times (see Venkat’s notes on his 14th Century Europe book). It would probably be shorter and deadlier than the plague; various “human zoo” scenarios may be pleasant to experience (after all zoo animals are happier in general than in the wild, at least from the point of view of basic needs), but harrowing to imagine. In any case, it’s not worth speculating on this.
What would a good outcome look like? Obviously, no one knows. It’s very hard to predict our interaction with a super-human intelligence. But here are some pretty standard “decent” scenarios:
(1) After a brief period of a pro-social AI piloted by a team of decent people, we end up with a world much like ours but with AI capabilities curbed for a long period of time [...]. If it were up to me I would design this world with certain “guard rail”-like changes: to me this would be a “Foundation”-style society somewhere in New Zealand (or on the bottom of the ocean perhaps? the moon?) consisting of people screened for decency, intelligence, etc. (but with serious diversity and variance built in), and with control of the world’s nukes, with the responsibility of imposing very basic non-interference and freedom of immigration criteria on the world’s societies (i.e., making the “archipelago” dream a reality, basically). So enforcing no torture, disincentivizing violent conflict, imposing various controls to make sure people can move from country to country and are exposed to the basic existence of a variety of experiences in the world, but allowing for culturally alien or disgusting practices in any given country: such as Russian homophobia, strict Islamic law, unpleasant-seeming (for Western Europeans) traditions in certain tribal cultures, etc. This combined with some sort of non-interventionist altruistic push. In this sci-fi scenario the Foundation-like culture would have de facto monopoly of the digital world (but use it sparingly) and also a system of safe nuclear power plants sufficient to provide the world’s power (but turned on carefully and slowly, to prevent economic jolts), but to carefully and “incontrovertibly” turn most of the proceeds into a universal basic income for the entire world population. Obviously this would have to be carefully thought out first by a community of intelligent and altruistic people with clear rules of debate/decision.
—The above was written extremely sleepy. [...]
(2) (Unlikely) AI becomes integrated with (at first, decent and intelligent later, all interested) humans via some kind of mind-machine interface, or alternatively a faithful human modeling in silica. Via a very careful and considered transition (in some sense “adiabatic”, i.e. designed so as not to lose any of our human ethos and meaning that can possibly be recovered safely) we become machines, with a good and meaningful (not wireheaded, other than by considered choice) world left for the hold-outs who chose to remain purely human.
(3) The “Her” scenario: AI takes off on its own, because of human carelessness or desperation. It develops in a way that cherishes and almost venerates humans, and puts effort into making a good, meaningful existence for humans (meaningful and good in sort of the above adiabatic sense, i.e. meaningful via a set of clearly desirable stages of progress from step to step, without hidden agendas, and carefully and thoughtfully avoiding creating or simulating, in an appropriate sense, anything that would be considered a moral horror by locally reasonable intelligences at any point in the journey). AI continues its own existence, either self-organized to facilitate this meaningful existence of humans or doing its own thing, in a clearly separated and “transcendent” world, genuinely giving humans a meaningful amount of self-determination, while also setting up guardrails to prevent horrors and also perhaps eliminating or mitigating some of the more mundane woes of existence (something like cancer, etc.) without turning us into wireheads.
(4) [A little less than ideal by my book, but probably more likely than the others]: The “garden of plenty” scenario. AI takes care of all human needs and jobs, and leaves all humans free to live a nevertheless potentially fulfilling existence, like aristocrats or Victorians but less classist, socializing learning reading, etc., with the realization that all they are doing is a hobby: perhaps “human-generated knowledge” would be a sort of sport, or analog of organic produce (homeopathically better, but via a game that makes everyone who plays it genuinely better in certain subtle ways). Perhaps AI will make certain “safe” types of art, craft and knowledge (maybe math! Here I’m obviously being very biased about my work’s meaning not becoming fully automated) purely the domain of humans, to give us a sense of self-determination. Perhaps humans are guided through a sort of accelerated development over a few generations to get to the no.2 scenario.
(5) There is something between numbers 3 and 4 above, less ideal than all of the above but likely, where AI quickly becomes an equal player to humans in the domain of meaning-generation, and sort of fills up space with itself while leaving a vaguely better (maybe number 4-like) Earth to humans. Perhaps imposes a time limit on humans (enforced via a fertility cap, hopefully with the understanding that humans can raise AI babies with genuine sense of filial consciousness and complete with bizzarre scences of trying to explain the crazy world of AI to their parents), after which the human project becomes the AI project, probably essentially incomprehensible to us.
There’s a sense that I have that while I’m partial to scenarios 1 and 2: I want humans to retain the monopoly on meaning-generation and to be able to feel empowered and important, it will be seen to be old-fashioned and almost dangerous by certain of my peers because of the lack of emphasis on harm-prevention, stable future, etc. I think this is part of the very serious debate, so far abstract and fun, but, as AI gets better, perhaps turning heated and loud, between whether comfort or meaning are more important goals of the human project (and both sides will get weird). I am firmly on the side of meaning, with a strict underpinning of retaining bodily and psychological integrity in all the object-level and meta-level senses (except I guess I’m ok with moving to the cloud eventually? Adiabatic is the word for me). Perhaps my point of view is on the side I think it is just in the weird group of futurists and rationalists that I mostly read when reading about AI: probably the generic human who thinks about AI is horrified by all of the above scenarios and just desperately hoping it will go away on its own, or has some really idiosyncratic mix of the above or other ideas which seem obviously preferable to them.
Why I’m in AI sequence: 2020 Journal entry about gpt3
I moved from math academia to full-time AI safety a year ago—in this I’m in the same boat as Adam Shai, whose reflection post on the topic I recommend you read instead of this.
In making the decision, I went through a lot of thinking and (attempts at) learning about AI before that. A lot of my thinking had been about whether a pure math academic can make a positive difference in AI, and examples that I thought counterindicated this—I finally decided this might be a good idea after talking to my sister Lizka extensively and doing MATS in Summer of 2023. I’m thinking of doing a more detailed post about my decision and thinking later, in case there are other academics thinking about making this transition (and feel free to reach out in pm’s in this case!).
But one thing I have started to forget is how scary and visceral AI risk felt when I was making the decision. I’m both glad and a little sad that the urgency is less visceral and more theoretical now. AI is “a part of the world”, not an alien feature: part of the “setting” in the Venkat Rao post that was part of my internal lexicon at the time.
For now, in order to fill a gap in my constantly flagging daily writing schedule, I’ll share a meandering entry from 2020 about how I thought about positive AI futures. I don’t endorse a lot of it; much is simplistic and low-context, or alternatively commonplace in these circles, though some of it holds up. It’s interesting reading back that the thing I thought was most interesting as a first attempt at orienting my thinking was fleshing out “positive futures” and what they might entail. Two big directional updates I’ve had since are thinking harder about “human alignment” and “human takeover”, and trying to temper the predictions that assume singularitarian “first-past-the-post” AGI for a messier “AI-is-kinda-AGI” world that we will likely end up in.
journal entry
7/19/2020 [...] I’m also being paranoid about GPT-3.
Let’s think. Will the world end, and if so, when? No one knows, obviously. GPT-3 is a good text generation bot. It can figure out a lot about semantics, mood, style, even a little about humor. It’s probably not going to take over the world yet. But how far away are we from AGI?
GPT-3 makes me think, “less than a decade”. There’s a possibility it will be soon (within the year). I’d assign that probability 10%. It felt like 20% when I first saw its text, but seeing Sam Altman’s remark and thinking a little harder, I don’t think it’s quite realistic for it to go AGI without a significant extra step or two. I think that I’d give it order of 50% within the decade. So it’s a little like living with a potentially fatal disease, with a prognosis of 10 years. Now we have no idea what AGI will be like. It will most likely either be very weird and deadly or revolutionary and good, though disappointing in some ways. I think there’s not much we can do about the weird and deadly scenarios. Humans have lived in sociopathic times (see Venkat’s notes on his 14th Century Europe book). It would probably be shorter and deadlier than the plague; various “human zoo” scenarios may be pleasant to experience (after all zoo animals are happier in general than in the wild, at least from the point of view of basic needs), but harrowing to imagine. In any case, it’s not worth speculating on this.
What would a good outcome look like? Obviously, no one knows. It’s very hard to predict our interaction with a super-human intelligence. But here are some pretty standard “decent” scenarios: (1) After a brief period of a pro-social AI piloted by a team of decent people, we end up with a world much like ours but with AI capabilities curbed for a long period of time [...]. If it were up to me I would design this world with certain “guard rail”-like changes: to me this would be a “Foundation”-style society somewhere in New Zealand (or on the bottom of the ocean perhaps? the moon?) consisting of people screened for decency, intelligence, etc. (but with serious diversity and variance built in), and with control of the world’s nukes, with the responsibility of imposing very basic non-interference and freedom of immigration criteria on the world’s societies (i.e., making the “archipelago” dream a reality, basically). So enforcing no torture, disincentivizing violent conflict, imposing various controls to make sure people can move from country to country and are exposed to the basic existence of a variety of experiences in the world, but allowing for culturally alien or disgusting practices in any given country: such as Russian homophobia, strict Islamic law, unpleasant-seeming (for Western Europeans) traditions in certain tribal cultures, etc. This combined with some sort of non-interventionist altruistic push. In this sci-fi scenario the Foundation-like culture would have de facto monopoly of the digital world (but use it sparingly) and also a system of safe nuclear power plants sufficient to provide the world’s power (but turned on carefully and slowly, to prevent economic jolts), but to carefully and “incontrovertibly” turn most of the proceeds into a universal basic income for the entire world population. Obviously this would have to be carefully thought out first by a community of intelligent and altruistic people with clear rules of debate/decision. —The above was written extremely sleepy. [...]
(2) (Unlikely) AI becomes integrated with (at first, decent and intelligent later, all interested) humans via some kind of mind-machine interface, or alternatively a faithful human modeling in silica. Via a very careful and considered transition (in some sense “adiabatic”, i.e. designed so as not to lose any of our human ethos and meaning that can possibly be recovered safely) we become machines, with a good and meaningful (not wireheaded, other than by considered choice) world left for the hold-outs who chose to remain purely human.
(3) The “Her” scenario: AI takes off on its own, because of human carelessness or desperation. It develops in a way that cherishes and almost venerates humans, and puts effort into making a good, meaningful existence for humans (meaningful and good in sort of the above adiabatic sense, i.e. meaningful via a set of clearly desirable stages of progress from step to step, without hidden agendas, and carefully and thoughtfully avoiding creating or simulating, in an appropriate sense, anything that would be considered a moral horror by locally reasonable intelligences at any point in the journey). AI continues its own existence, either self-organized to facilitate this meaningful existence of humans or doing its own thing, in a clearly separated and “transcendent” world, genuinely giving humans a meaningful amount of self-determination, while also setting up guardrails to prevent horrors and also perhaps eliminating or mitigating some of the more mundane woes of existence (something like cancer, etc.) without turning us into wireheads.
(4) [A little less than ideal by my book, but probably more likely than the others]: The “garden of plenty” scenario. AI takes care of all human needs and jobs, and leaves all humans free to live a nevertheless potentially fulfilling existence, like aristocrats or Victorians but less classist, socializing learning reading, etc., with the realization that all they are doing is a hobby: perhaps “human-generated knowledge” would be a sort of sport, or analog of organic produce (homeopathically better, but via a game that makes everyone who plays it genuinely better in certain subtle ways). Perhaps AI will make certain “safe” types of art, craft and knowledge (maybe math! Here I’m obviously being very biased about my work’s meaning not becoming fully automated) purely the domain of humans, to give us a sense of self-determination. Perhaps humans are guided through a sort of accelerated development over a few generations to get to the no.2 scenario.
(5) There is something between numbers 3 and 4 above, less ideal than all of the above but likely, where AI quickly becomes an equal player to humans in the domain of meaning-generation, and sort of fills up space with itself while leaving a vaguely better (maybe number 4-like) Earth to humans. Perhaps imposes a time limit on humans (enforced via a fertility cap, hopefully with the understanding that humans can raise AI babies with genuine sense of filial consciousness and complete with bizzarre scences of trying to explain the crazy world of AI to their parents), after which the human project becomes the AI project, probably essentially incomprehensible to us.
There’s a sense that I have that while I’m partial to scenarios 1 and 2: I want humans to retain the monopoly on meaning-generation and to be able to feel empowered and important, it will be seen to be old-fashioned and almost dangerous by certain of my peers because of the lack of emphasis on harm-prevention, stable future, etc. I think this is part of the very serious debate, so far abstract and fun, but, as AI gets better, perhaps turning heated and loud, between whether comfort or meaning are more important goals of the human project (and both sides will get weird). I am firmly on the side of meaning, with a strict underpinning of retaining bodily and psychological integrity in all the object-level and meta-level senses (except I guess I’m ok with moving to the cloud eventually? Adiabatic is the word for me). Perhaps my point of view is on the side I think it is just in the weird group of futurists and rationalists that I mostly read when reading about AI: probably the generic human who thinks about AI is horrified by all of the above scenarios and just desperately hoping it will go away on its own, or has some really idiosyncratic mix of the above or other ideas which seem obviously preferable to them.