Agree, and in addition to this, I think labs should be really explicit about which part of their motives are supposedly altruistic and which parts are explicitly about avoiding the scenario in which they, personally, don’t benefit from AGI. Because “we need to build AGI because it’s the only way to advance humanity’s technology from now on” is one thing (debatable, IMO, but at least an argument one can make) and “we need to build AGI because if we don’t there’s a risk that immortality tech won’t be here soon enough to apply to our CEO” is another. Never mind the talk about “capturing all value” that Sam Altman has done; “I need to ensure absolute power for myself because I don’t feel like trusting anyone else with it” is supervillain talk. And these arguments should be substantiated: if they think there’s risks, we want to see real numbers, estimates, and processes and data by which those numbers were calculated. It’s one thing to argue something is worth risking the Earth for, which can in some extreme cases be true, but to do so unilaterally without even being transparent about the precise entity and nature of the risks is indefensible.
“we need to build AGI because it’s the only way to advance humanity’s technology from now on” is one thing (debatable, IMO, but at least an argument one can make)
It’s not a sane argument in favor of advancing now vs. later when it’s less likely to kill everyone (because there was more time to figure out how to advance safely). The same holds for any argument in the “think of the enormous upside” reference class, the upside isn’t going anywhere, it’s still there in 20 years.
Instead, there is talk about scaling to 10 times GPT-4 compute in 2024 and many dozens of times GPT-4 compute in 2025 (billions of dollars in compute). Nobody knows what amount of compute is sufficient for AGI, in the sense of capability for mostly autonomous research, especially with some algorithmic improvements. Any significant scaling poses a significant risk of reaching AGI. And once there is AGI, pausing before superintelligence becomes much less plausible than it is now.
It’s not a sane argument in favor of advancing now vs. later when it’s less likely to kill everyone (because there was more time to figure out how to advance safely). The same holds for any argument in the “think of the enormous upside” reference class, the upside isn’t going anywhere, it’s still there in 20 years.
Oh, I mean, I do agree. Unless you apply some really severe discount rate to those upsides, there’s no way they can outweigh a major risk of extinction (and if you are applying a really severe discount rate because you think you, personally, will die before seeing them, then that’s again just being really selfish). But I’m saying it is at least an argument we should try to reckon with at the societal level. Petty private desire for immortality should not even be entertained instead. If you want to risk humanity for the sake of your own life, you’re literally taking the sort of insane bet you’d expect a villainous fantasy video game necromancer to. Not only it’s evil, it’s not even particularly well written evil.
Nobody knows what amount of compute is sufficient for AGI, in the sense of capability for mostly autonomous research, especially with some algorithmic improvements.
This is what I find really puzzling. The human brain, which only crossed the sapience threshold a quarter-million-years of evolution ago, has O(1014) synapses, and a presumably a lot of evolved genetically-determined inductive biases. Synapses have very sparse connectivity, so synapse counts should presumably be compared to parameter counts after sparsification, which tends to reduce them by 1-2 orders of magnitude. GPT-4 is believed to have O(1012) parameters: it’s an MoE model so has some sparsity and some duplication, so call that O(1010or1011) for a comparable number. So GPT-4 is showing “sparks of AGI” something like 3 or 4 orders of magnitude before we would expect AGI from a biological parallel. I find that astonishingly low. Bear in mind also that a human brain only needs to implement one human mind, whereas an LLM is trying to learn to simulate every human who’s ever written material on the Internet in any high/medium-resource language, a clearly harder problem.
I don’t know if this is evidence that AGI is a lot easier than humans make it look, or a lot harder than GPT-4 makes it look? Maybe controlling a real human body is an incredibly compute-intensive task (but then I’m pretty sure that < 90% of the human brain’s synapses are devoted to motor control and controlling the internal organs, and more than 10% are used for language/visual processing, reasoning, memory, and executive function). Possibly we’re mostly still fine-tuned for something other than being an AGI? Given the implications for timelines, I’d really like to know.
I had a thought. When comparing parameter counts of LLMs to synapse counts, for parity the parameter count of each attention head should be multiplied by the number of locations that it can attend to, or at least its logarithm. That would account for about an order of magnitude of the disparity. So make that 2-3 orders of magnitude. That sounds rather more plausible for sparks of AGI to full AGI.
Agree, and in addition to this, I think labs should be really explicit about which part of their motives are supposedly altruistic and which parts are explicitly about avoiding the scenario in which they, personally, don’t benefit from AGI. Because “we need to build AGI because it’s the only way to advance humanity’s technology from now on” is one thing (debatable, IMO, but at least an argument one can make) and “we need to build AGI because if we don’t there’s a risk that immortality tech won’t be here soon enough to apply to our CEO” is another. Never mind the talk about “capturing all value” that Sam Altman has done; “I need to ensure absolute power for myself because I don’t feel like trusting anyone else with it” is supervillain talk. And these arguments should be substantiated: if they think there’s risks, we want to see real numbers, estimates, and processes and data by which those numbers were calculated. It’s one thing to argue something is worth risking the Earth for, which can in some extreme cases be true, but to do so unilaterally without even being transparent about the precise entity and nature of the risks is indefensible.
It’s not a sane argument in favor of advancing now vs. later when it’s less likely to kill everyone (because there was more time to figure out how to advance safely). The same holds for any argument in the “think of the enormous upside” reference class, the upside isn’t going anywhere, it’s still there in 20 years.
Instead, there is talk about scaling to 10 times GPT-4 compute in 2024 and many dozens of times GPT-4 compute in 2025 (billions of dollars in compute). Nobody knows what amount of compute is sufficient for AGI, in the sense of capability for mostly autonomous research, especially with some algorithmic improvements. Any significant scaling poses a significant risk of reaching AGI. And once there is AGI, pausing before superintelligence becomes much less plausible than it is now.
Oh, I mean, I do agree. Unless you apply some really severe discount rate to those upsides, there’s no way they can outweigh a major risk of extinction (and if you are applying a really severe discount rate because you think you, personally, will die before seeing them, then that’s again just being really selfish). But I’m saying it is at least an argument we should try to reckon with at the societal level. Petty private desire for immortality should not even be entertained instead. If you want to risk humanity for the sake of your own life, you’re literally taking the sort of insane bet you’d expect a villainous fantasy video game necromancer to. Not only it’s evil, it’s not even particularly well written evil.
This is what I find really puzzling. The human brain, which only crossed the sapience threshold a quarter-million-years of evolution ago, has O(1014) synapses, and a presumably a lot of evolved genetically-determined inductive biases. Synapses have very sparse connectivity, so synapse counts should presumably be compared to parameter counts after sparsification, which tends to reduce them by 1-2 orders of magnitude. GPT-4 is believed to have O(1012) parameters: it’s an MoE model so has some sparsity and some duplication, so call that O(1010 or 1011) for a comparable number. So GPT-4 is showing “sparks of AGI” something like 3 or 4 orders of magnitude before we would expect AGI from a biological parallel. I find that astonishingly low. Bear in mind also that a human brain only needs to implement one human mind, whereas an LLM is trying to learn to simulate every human who’s ever written material on the Internet in any high/medium-resource language, a clearly harder problem.
I don’t know if this is evidence that AGI is a lot easier than humans make it look, or a lot harder than GPT-4 makes it look? Maybe controlling a real human body is an incredibly compute-intensive task (but then I’m pretty sure that < 90% of the human brain’s synapses are devoted to motor control and controlling the internal organs, and more than 10% are used for language/visual processing, reasoning, memory, and executive function). Possibly we’re mostly still fine-tuned for something other than being an AGI? Given the implications for timelines, I’d really like to know.
I had a thought. When comparing parameter counts of LLMs to synapse counts, for parity the parameter count of each attention head should be multiplied by the number of locations that it can attend to, or at least its logarithm. That would account for about an order of magnitude of the disparity. So make that 2-3 orders of magnitude. That sounds rather more plausible for sparks of AGI to full AGI.