This post makes a lot of very confident predictions:
[human judges] will be able to directly inspect, analyze, search and compare agent mind states and thought histories, both historical and in real-time.
AGI will almost certainly require recursion: our great creative achievements rely on long iterative recursive thought trains implementing various forms of search/optimization over inner conceptual design spaces
AGI will likely require new approaches for running large ANNs on GPUs [42], or will arrive with more neuromorphic hardware.
Early AGI will likely require a small supercomputer with around 100 to 1000 high end GPUs using model parallelism
a 1000 GPU cluster will be able to run 100 to 1000 agents in parallel at real-time speed or greater
efficient designs will find ways to compress any correlations/similarities/regularities across inter-agent synaptic patterns
DL based AGI will not be mysterious and alien; instead it will be familiar and anthropomorphic
AGI will be a generic/universal learning system like the brain [and] will necessarily be human as AGI will grow up immersed in human culture, learning human languages and absorbing human knowledge.
Evolution found means to temper and align empowerment[52], mechanisms we will reverse engineer for convergent reasons
AGI will be born of our culture, growing up in human information environments
AGI will mostly have similar/equivalent biases—a phenomenon already witnessed in large language models
If you train/raise AGI in a human-like environment, [...], then its self-optimizing internal world model will necessarily learn efficient sub-models of these external agents and their values/goals. Theory of mind is Inverse Reinforcement Learning.
It seems to make these predictions with a time horizon of a decade (“soon”).
I’m not saying that some of these are plausible avenues, but to me, this comes across as overconfident (it might be a stylistic method, but I think that is also problematic in the context of AGI Safety).
I find it interesting that you consider “will likely” to be an example of “very confident”, whereas I’m using that specifically to indicate uncertainty, as in “X is likely” implies a bit over 50% odds on some cluster of ideas vs others (contingent on some context), but very far from certainty or high confidence.
The only prediction directly associated with a time horizon is the opening prediction of AGI most likely this decade. Fully supporting/explaining that timeline prediction would probably require a short post, but it mostly reduces to: the surprisingly simplicity of learning algorithms, the dominance of scaling, and of course brain efficiency which together imply AGI arrives predictably around or a bit after brain parity near the endphase of moore’s law. The early versions of this theory have already made many successful postdictions/predictions[1].
Looking at the metaculus prediction for “Date Weakly General AI is Publicly Known”, I see the median was in the 2050′s just back in early 2020, had dropped down to around 2040 by the time I posted on brain efficiency earlier this year, and now is down to 2028: equivalent to my Moravec-style prediction of most likely this decade. I will take your advice to link that timeline prediction to metaculus, thanks.
Most of the other statements are all contextually bound to a part of the larger model in the surrounding text and should (hopefully obviously) not be interpreted out-of-context as free-floating unconditional predictions.
For example:
“[human judges] will be able to directly inspect, analyze, search and compare agent mind states and thought histories, both historical and in real-time.”
Is a component of a larger design proposal, which involves brain-like AGI with inner monologues and other features that make that feature rather obviously tractable.
Imagine the year is 1895 and I’ve written a document describing how airplanes could work, and you are complaining that I’m making an overconfident prediction that “human pilots will be able to directly and easily control the plane’s orientation in three dimensions: yaw, pitch, and roll”. That’s a prediction only in the sense of being a design prediction, and only in a highly contextual sense contingent on the rest of the system.
I’m not saying that some of these are plausible avenues, but to me, this comes across as overconfident (it might be a stylistic method, but I think that is also problematic in the context of AGI Safety).
I’m genuinely more curious which of these you find the most overconfident/unlikely, given the rest of the design context.
Perhaps these?:
DL based AGI will not be mysterious and alien; instead it will be familiar and anthropomorphic
AGI will be born of our culture, growing up in human information environments
AGI will mostly have similar/equivalent biases—a phenomenon already witnessed in large language models
Sure these were highly controversial/unpopular opinions on LW when I was first saying AGI would be anthropomorphic, that brains are efficient, etc way back in 2010, long before DL, when nearly everyone on LW thought AGI would be radically different than the brain (ironically based mostly on the sequences: a huge wall of often unsubstantiated confident philosophical doctrine).
But on these issues regarding the future of AI, it turns out that I (along with moravec/kurzweil/etc) was mostly correct, and EY/MIRI/LW was mostly wrong—and it seems MIRI folks concur to some extent and some on LW updated. The model difference that led to divergent predictions about the future of AI is naturally associated with different views on brain efficiency[2] and divergent views on tractability of safety strategies[3].
For example the simple moravec-style model that predicts AI task parity around the time of flop parity to equivalent brain regions roughly predicted DL milestones many decades in advance, and the timing of NLP breakthroughs ala LLM is/was also predictable based on total training flops equivalence to brain linguistic cortex.
For example see this comment where Rob Bensinger says, “If we had AGI that were merely as aligned as a human, I think that would immediately eliminate nearly all of the world’s existential risk.”, but then for various reasons doesn’t believe that’s especially doable.
I didn’t mean that each one individually was very overconfident and just listed all predictions and there were no “might” or “plausibly” or even “more likely than not” (which I would see as >50%). I would read a “will be able to X” as >90% confident, and there are many of these.
But your explanation that each statement should be read with the implicit qualification “assuming the contextual model as given” clarifies this. I’m not sure the majority will read it like that, though.
I think the most overconfident claim is this:
AGI will almost certainly require recursion
I’m no longer sure you mean the statements as applying in a ten-year horizon, but among the statements, I think one about the human judges is the one furthest out because it mostly depends on the others being achieved (GPU clusters running agents etc.).
Yeah in retrospect I probably should reword that, as it may not convey my model very well. I am fairly confident that AGI will require something like recursion (or recurrence actually), but that something more specifically is information flow across time—over various timescales—and across the space of intermediate computations, but you can also get that from using memory mechanisms.
Just for the record: I am working on a brain-like AGI project, and I think approaches that simulate agents in a human-like environment are important and will plausibly give a lot of insights into value acquisition in AI and humans alike. I’m just less confident about many of your specific claims.
I find it interesting that you consider “will likely” to be an example of “very confident”, whereas I’m using that specifically to indicate uncertainty, as in “X is likely” implies a bit over 50% odds on some cluster of ideas vs others (contingent on some context), but very far from certainty or high confidence.
This is a common failure mode when communicating uncertainty. If you think of likely as meaning some very specific probability range, and you think it matters in that instance, use that probability range instead. People’s perception of what “probable” means ranges from around 20 to 80% iirc from reading Tetlocks Superforcasting. If you need more evidence: this is easily verified by just asking 2-3 people what they think what “likely” means.
This post makes a lot of very confident predictions:
[human judges] will be able to directly inspect, analyze, search and compare agent mind states and thought histories, both historical and in real-time.
AGI will almost certainly require recursion: our great creative achievements rely on long iterative recursive thought trains implementing various forms of search/optimization over inner conceptual design spaces
AGI will likely require new approaches for running large ANNs on GPUs [42], or will arrive with more neuromorphic hardware.
Early AGI will likely require a small supercomputer with around 100 to 1000 high end GPUs using model parallelism
a 1000 GPU cluster will be able to run 100 to 1000 agents in parallel at real-time speed or greater
efficient designs will find ways to compress any correlations/similarities/regularities across inter-agent synaptic patterns
DL based AGI will not be mysterious and alien; instead it will be familiar and anthropomorphic
AGI will be a generic/universal learning system like the brain [and] will necessarily be human as AGI will grow up immersed in human culture, learning human languages and absorbing human knowledge.
Evolution found means to temper and align empowerment[52], mechanisms we will reverse engineer for convergent reasons
AGI will be born of our culture, growing up in human information environments
AGI will mostly have similar/equivalent biases—a phenomenon already witnessed in large language models
If you train/raise AGI in a human-like environment, [...], then its self-optimizing internal world model will necessarily learn efficient sub-models of these external agents and their values/goals. Theory of mind is Inverse Reinforcement Learning.
It seems to make these predictions with a time horizon of a decade (“soon”).
I’m not saying that some of these are plausible avenues, but to me, this comes across as overconfident (it might be a stylistic method, but I think that is also problematic in the context of AGI Safety).
It might make sense to link these statements to ongoing predictions on Metaculus.
I find it interesting that you consider “will likely” to be an example of “very confident”, whereas I’m using that specifically to indicate uncertainty, as in “X is likely” implies a bit over 50% odds on some cluster of ideas vs others (contingent on some context), but very far from certainty or high confidence.
The only prediction directly associated with a time horizon is the opening prediction of AGI most likely this decade. Fully supporting/explaining that timeline prediction would probably require a short post, but it mostly reduces to: the surprisingly simplicity of learning algorithms, the dominance of scaling, and of course brain efficiency which together imply AGI arrives predictably around or a bit after brain parity near the endphase of moore’s law. The early versions of this theory have already made many successful postdictions/predictions[1].
Looking at the metaculus prediction for “Date Weakly General AI is Publicly Known”, I see the median was in the 2050′s just back in early 2020, had dropped down to around 2040 by the time I posted on brain efficiency earlier this year, and now is down to 2028: equivalent to my Moravec-style prediction of most likely this decade. I will take your advice to link that timeline prediction to metaculus, thanks.
Most of the other statements are all contextually bound to a part of the larger model in the surrounding text and should (hopefully obviously) not be interpreted out-of-context as free-floating unconditional predictions.
For example: “[human judges] will be able to directly inspect, analyze, search and compare agent mind states and thought histories, both historical and in real-time.”
Is a component of a larger design proposal, which involves brain-like AGI with inner monologues and other features that make that feature rather obviously tractable.
Imagine the year is 1895 and I’ve written a document describing how airplanes could work, and you are complaining that I’m making an overconfident prediction that “human pilots will be able to directly and easily control the plane’s orientation in three dimensions: yaw, pitch, and roll”. That’s a prediction only in the sense of being a design prediction, and only in a highly contextual sense contingent on the rest of the system.
I’m genuinely more curious which of these you find the most overconfident/unlikely, given the rest of the design context.
Perhaps these?:
Sure these were highly controversial/unpopular opinions on LW when I was first saying AGI would be anthropomorphic, that brains are efficient, etc way back in 2010, long before DL, when nearly everyone on LW thought AGI would be radically different than the brain (ironically based mostly on the sequences: a huge wall of often unsubstantiated confident philosophical doctrine).
But on these issues regarding the future of AI, it turns out that I (along with moravec/kurzweil/etc) was mostly correct, and EY/MIRI/LW was mostly wrong—and it seems MIRI folks concur to some extent and some on LW updated. The model difference that led to divergent predictions about the future of AI is naturally associated with different views on brain efficiency[2] and divergent views on tractability of safety strategies[3].
For example the simple moravec-style model that predicts AI task parity around the time of flop parity to equivalent brain regions roughly predicted DL milestones many decades in advance, and the timing of NLP breakthroughs ala LLM is/was also predictable based on total training flops equivalence to brain linguistic cortex.
EY was fairly recently claiming that brains were about half a million times less efficient than the thermodynamic limit.
For example see this comment where Rob Bensinger says, “If we had AGI that were merely as aligned as a human, I think that would immediately eliminate nearly all of the world’s existential risk.”, but then for various reasons doesn’t believe that’s especially doable.
Thank you for the clarification.
I didn’t mean that each one individually was very overconfident and just listed all predictions and there were no “might” or “plausibly” or even “more likely than not” (which I would see as >50%). I would read a “will be able to X” as >90% confident, and there are many of these.
But your explanation that each statement should be read with the implicit qualification “assuming the contextual model as given” clarifies this. I’m not sure the majority will read it like that, though.
I think the most overconfident claim is this:
I’m no longer sure you mean the statements as applying in a ten-year horizon, but among the statements, I think one about the human judges is the one furthest out because it mostly depends on the others being achieved (GPU clusters running agents etc.).
Yeah in retrospect I probably should reword that, as it may not convey my model very well. I am fairly confident that AGI will require something like recursion (or recurrence actually), but that something more specifically is information flow across time—over various timescales—and across the space of intermediate computations, but you can also get that from using memory mechanisms.
Just for the record: I am working on a brain-like AGI project, and I think approaches that simulate agents in a human-like environment are important and will plausibly give a lot of insights into value acquisition in AI and humans alike. I’m just less confident about many of your specific claims.
This is a common failure mode when communicating uncertainty. If you think of likely as meaning some very specific probability range, and you think it matters in that instance, use that probability range instead. People’s perception of what “probable” means ranges from around 20 to 80% iirc from reading Tetlocks Superforcasting. If you need more evidence: this is easily verified by just asking 2-3 people what they think what “likely” means.