Thanks, this is a really interesting conversation to read!
One thing I have not seen discussed much from either of these viewpoints (or maybe it is there and I just missed it) is how rare frontier-expanding intelligence is among humans, and what that means for AI. Among humans, if you want to raise someone, it’s going to cost you something like 20-25 years and $2-500k. If you want to train a single scientist, on average you’re going to have to do this about a few hundred to a thousand times. If you want to create a scientist in a specific field, much more than that. If you want to create the specific scientist in a specific field who is going to be able to noticeably advance that field’s frontier, well, you might need to raise a billion humans before that happens, given the way we generally train humans.
If I went out in public and said, “Ok, based on this, in order to solve quantum gravity we’ll need to spend at least a quadrillion dollars on education” the responses (other than rightly ignoring me) would be a mix of “That’s an absurd claim” and “We’re obviously never going to do that,” when in fact that’s just the default societal path viewed from another angle.
But, in this, and even more so in AI, we only have to succeed once. In AI, We’re trying to do so in roughly all the fields at once, using a much smaller budget than we apply to training all the humans, while (in many cases) demanding comparable or better results before we are willing to believe AGI is within reach of our methods and architectures. Maybe this is a matter of shots-on-goal, as much as anything else, and better methods and insights are mostly reducing the number of shots on goal needed to superhuman rates rather than expanding the space of possibilities those shots can access.
A second, related thought is that whenever I read statements like “For example, while GPT4 scored very well on the math SAT, it still made elementary-school mistakes on basic arithmetic questions,” I think, “This is true of me, and AFAIK all humans, as well.” I think it is therefore mostly irrelevant to the core question, until and unless we can characterize important differences in when and why it makes such mistakes, compared to humans (which do exist, are getting studied and characterized).
how rare frontier-expanding intelligence is among humans,
On my view, all human children (except in extreme cases, e.g. born without a brain) have this type of intelligence. Children create their conceptual worlds originarily. It’s not literally frontier-expanding because the low-hanging fruit have been picked, but it’s roughly the same mechanism.
Maybe this is a matter of shots-on-goal, as much as anything else, and better methods and insights are mostly reducing the number of shots on goal needed to superhuman rates rather than expanding the space of possibilities those shots can access.
Yeah but drawing from the human distribution is very different from drawing from the LP25 distribution. Humans all have the core mechanisms, and then you’re selecting over variation in genetic and developmental brain health / inclination towards certain kinds of thinking / life circumstances enabling thinking / etc. For LP25, you’re mostly sampling from a very narrow range of Architectures, probably none of which are generally intelligent.
So technically you could set up your laptop to generate a literally random python script and run it every 5 minutes. Eventually this would create an AGI, you just need more shots on goal—but that tells you basically nothing. “Expanding the space” and “narrowing the search” are actually interchangeable in the relevant sense; by narrowing the search, you expand the richness of variations that are accessible to your search (clustered in the areas you’ve focused on). The size of what you actually explore is roughly fixed (well, however much compute you have), like an incompressible fluid—squish it in one direction, it bloops out bigger in another direction.
A second, related thought is that whenever I read statements like “For example, while GPT4 scored very well on the math SAT, it still made elementary-school mistakes on basic arithmetic questions,” I think, “This is true of me, and AFAIK all humans, as well.” I think it is therefore mostly irrelevant to the core question, until and unless we can characterize important differences in when and why it makes such mistakes, compared to humans (which do exist, are getting studied and characterized).
The distribution of mistakes is very different, and, I think, illuminates the differences between human minds and LLMs. (Epistemic status: I have not thoroughly tested the distribution of AI mistakes against humans, nor have I read thorough research which tested it empirically. I could be wrong about the shape of these distributions.) It seems like LLM math ability cuts off much more sharply (around 8 digits I believe), whereas for humans, error rates are only going to go up slowly as we add digits.
This makes me somewhat more inclined towards slow timelines. However, it bears repeating that LLMs are not human-brain-sized yet. Maybe when they get to around human-brain-sized, the distribution of errors will look more human.
Thanks, this is a really interesting conversation to read!
One thing I have not seen discussed much from either of these viewpoints (or maybe it is there and I just missed it) is how rare frontier-expanding intelligence is among humans, and what that means for AI. Among humans, if you want to raise someone, it’s going to cost you something like 20-25 years and $2-500k. If you want to train a single scientist, on average you’re going to have to do this about a few hundred to a thousand times. If you want to create a scientist in a specific field, much more than that. If you want to create the specific scientist in a specific field who is going to be able to noticeably advance that field’s frontier, well, you might need to raise a billion humans before that happens, given the way we generally train humans.
If I went out in public and said, “Ok, based on this, in order to solve quantum gravity we’ll need to spend at least a quadrillion dollars on education” the responses (other than rightly ignoring me) would be a mix of “That’s an absurd claim” and “We’re obviously never going to do that,” when in fact that’s just the default societal path viewed from another angle.
But, in this, and even more so in AI, we only have to succeed once. In AI, We’re trying to do so in roughly all the fields at once, using a much smaller budget than we apply to training all the humans, while (in many cases) demanding comparable or better results before we are willing to believe AGI is within reach of our methods and architectures. Maybe this is a matter of shots-on-goal, as much as anything else, and better methods and insights are mostly reducing the number of shots on goal needed to superhuman rates rather than expanding the space of possibilities those shots can access.
A second, related thought is that whenever I read statements like “For example, while GPT4 scored very well on the math SAT, it still made elementary-school mistakes on basic arithmetic questions,” I think, “This is true of me, and AFAIK all humans, as well.” I think it is therefore mostly irrelevant to the core question, until and unless we can characterize important differences in when and why it makes such mistakes, compared to humans (which do exist, are getting studied and characterized).
On my view, all human children (except in extreme cases, e.g. born without a brain) have this type of intelligence. Children create their conceptual worlds originarily. It’s not literally frontier-expanding because the low-hanging fruit have been picked, but it’s roughly the same mechanism.
Yeah but drawing from the human distribution is very different from drawing from the LP25 distribution. Humans all have the core mechanisms, and then you’re selecting over variation in genetic and developmental brain health / inclination towards certain kinds of thinking / life circumstances enabling thinking / etc. For LP25, you’re mostly sampling from a very narrow range of Architectures, probably none of which are generally intelligent.
So technically you could set up your laptop to generate a literally random python script and run it every 5 minutes. Eventually this would create an AGI, you just need more shots on goal—but that tells you basically nothing. “Expanding the space” and “narrowing the search” are actually interchangeable in the relevant sense; by narrowing the search, you expand the richness of variations that are accessible to your search (clustered in the areas you’ve focused on). The size of what you actually explore is roughly fixed (well, however much compute you have), like an incompressible fluid—squish it in one direction, it bloops out bigger in another direction.
The distribution of mistakes is very different, and, I think, illuminates the differences between human minds and LLMs. (Epistemic status: I have not thoroughly tested the distribution of AI mistakes against humans, nor have I read thorough research which tested it empirically. I could be wrong about the shape of these distributions.) It seems like LLM math ability cuts off much more sharply (around 8 digits I believe), whereas for humans, error rates are only going to go up slowly as we add digits.
This makes me somewhat more inclined towards slow timelines. However, it bears repeating that LLMs are not human-brain-sized yet. Maybe when they get to around human-brain-sized, the distribution of errors will look more human.