Because “intelligence”, in terms like IQ that make sense to a human being, is not a property of the algorithm, it’s (as far as my investigations can tell) a function of:
FLOPS (how many computational operations can be done in a period of wall-clock time)
Memory space (and thus, how large the knowledge base of models can get)
Compression/generalization power (which actually requires solving difficult information-theoretic and algorithmic problems)
So basically, if you just keep giving your AGI more CPU power and storage space, I do think it will cross over into something dangerously like superintelligence, which I think really just reduces to:
Building and utilizing a superhuman base of domain knowledge
Doing so more quickly than a human being can do
With greater surety than a human being can obtain
There is no gap-in-kind between your reasoning abilities and those of a dangerously superintelligent AGI. It just has a lot more resources for doing the same kinds of stuff.
An easy analogy for beginners shows up the first time you read about sampling-based computational Bayesian statistics: the accuracy of the probabilities inferred depends directly on the sample size. Since additional computational power can always be put towards more samples on the margin, you can always get your inferred estimates marginally closer to the real probabilities just by adding compute time.
Since additional computational power can always be put towards more samples on the margin, you can always get your inferred estimates marginally closer to the real probabilities just by adding compute time.
By adding exponentially more time.
Computational complexity can’t simply be waived away by saying “add more time/memory”.
B) It’s a metaphor intended to convey the concept to people without the technical education to know or care where the diminishing returns line is going to be.
C) As a matter of fact, in sampling-based inference, computation time scales linearly with sample size: you’re just running the same code n times with n different random parameter values. There will be diminishing returns to sample size once you’ve got a large enough n for relative frequencies in the sample to get within some percentage of the real probabilities, but actually adding more is a linearly-scaling cost.
It’s a metaphor intended to convey the concept to people without the technical education to know or care where the diminishing returns line is going to be.
The problem is that it conveys the concept in a very misleading way.
No, it does not. In sampling-based inference, the necessary computation time grows linearly with the demanded sample size, not exponentially. There may be diminishing returns to increasingly accurate probabilities, but that’s a fact about your utility function rather than an exponential increase in necessary computational power.
This precise switch, from an exponential computational cost growth-curve to a linear one, is why sampling-based inference has given us a renaissance in Bayesian statistics.
There may be diminishing returns to increasingly accurate probabilities, but that’s a fact about your utility function
This has nothing to do with utility functions.
Sample size is a linear function of the CPU time, but the accuracy of the estimates is NOT a linear function of sample size. In fact, there are huge diminishing returns to large sample sizes.
the accuracy of the probabilities inferred depends directly on the sample size. Since additional computational power can always be put towards more samples on the margin, you can always get your inferred estimates marginally closer to the real probabilities just by adding compute time.
Hold on, hold on. There are at least two samples involved.
Sample 1 is your original data sampled from reality. Its size is fixed—additional computational power will NOT get you more samples from reality.
Sample 2 is an intermediate step in “computational Bayesian statistics” (e.g MCMC). Its size is arbitrary and yes, you can always increase it by throwing more computational power at the problem.
However by increasing the size of sample 2 you do NOT get “marginally closer to the real probabilities”, for that you need to increase the size of sample 1. Adding compute time gets you marginally closer only to the asymptotic estimate which in simple cases you can even calculate analytically.
Yes, there is an asymptotic limit where eventually you just approach the analytic estimator, and need more empirical/sensory data. There are almost always asymptotic limits, usually the “platonic” or “true” full-information probability.
But as I said, it was an analogy for beginners, not a complete description of how I expect a real AI system to work.
There are at least two samples involved. [Y]our original data sampled from reality [...] is fixed—additional computational power will NOT get you more samples from reality.
That’s true for something embodied as Human v1.0 or e.g. in a robot chassis, though the I/O bound even in that case might end up being greatly superhuman -- certainly the most intelligent humans can glean much more information from sensory inputs of basically fixed length than the least intelligent can, which suggests to me that the size of our training set is not our limiting factor. But it’s not necessarily true for something that can generate its own sensors and effectors, suitably generalized; depending on architecture, that could end up being CPU-bound or I/O-bound, and I don’t think we have enough understanding of the problem to say which.
The first thing that comes to mind, scaled up to its initial limits, might look like a botnet running image interpretation over the output of every poorly secured security camera in the world (and there are a lot of them). That would almost certainly be CPU-bound. But there are probably better options out there.
it’s not necessarily true for something that can generate its own sensors and effectors
Yes, but now we’re going beyond the boundaries of the original comment which talked about how pure computing power (FLOPS + memory) can improve things. If you start building physical things (sensors and effectors), it’s an entirely different ball game.
Sensors and effectors in an AI context are not necessarily physical. They’re essentially the AI’s inputs and outputs, with a few constraints that are unimportant here; the terminology is a holdover from the days when everyone expected AI would be used primarily to run robots. We could be talking about web crawlers and Wikipedia edits, for example.
Fair point, though physical reality is still physical reality. If you need a breakthrough in building nanomachines, for example, you don’t get there by crawling the web really really fast.
Because “intelligence”, in terms like IQ that make sense to a human being, is not a property of the algorithm, it’s (as far as my investigations can tell) a function of:
FLOPS (how many computational operations can be done in a period of wall-clock time)
Memory space (and thus, how large the knowledge base of models can get)
Compression/generalization power (which actually requires solving difficult information-theoretic and algorithmic problems)
So basically, if you just keep giving your AGI more CPU power and storage space, I do think it will cross over into something dangerously like superintelligence, which I think really just reduces to:
Building and utilizing a superhuman base of domain knowledge
Doing so more quickly than a human being can do
With greater surety than a human being can obtain
There is no gap-in-kind between your reasoning abilities and those of a dangerously superintelligent AGI. It just has a lot more resources for doing the same kinds of stuff.
An easy analogy for beginners shows up the first time you read about sampling-based computational Bayesian statistics: the accuracy of the probabilities inferred depends directly on the sample size. Since additional computational power can always be put towards more samples on the margin, you can always get your inferred estimates marginally closer to the real probabilities just by adding compute time.
By adding exponentially more time.
Computational complexity can’t simply be waived away by saying “add more time/memory”.
A) I did say marginally.
B) It’s a metaphor intended to convey the concept to people without the technical education to know or care where the diminishing returns line is going to be.
C) As a matter of fact, in sampling-based inference, computation time scales linearly with sample size: you’re just running the same code n times with n different random parameter values. There will be diminishing returns to sample size once you’ve got a large enough n for relative frequencies in the sample to get within some percentage of the real probabilities, but actually adding more is a linearly-scaling cost.
The problem is that it conveys the concept in a very misleading way.
No, it does not. In sampling-based inference, the necessary computation time grows linearly with the demanded sample size, not exponentially. There may be diminishing returns to increasingly accurate probabilities, but that’s a fact about your utility function rather than an exponential increase in necessary computational power.
This precise switch, from an exponential computational cost growth-curve to a linear one, is why sampling-based inference has given us a renaissance in Bayesian statistics.
This has nothing to do with utility functions.
Sample size is a linear function of the CPU time, but the accuracy of the estimates is NOT a linear function of sample size. In fact, there are huge diminishing returns to large sample sizes.
Ah, ok, fair enough on that one.
Hold on, hold on. There are at least two samples involved.
Sample 1 is your original data sampled from reality. Its size is fixed—additional computational power will NOT get you more samples from reality.
Sample 2 is an intermediate step in “computational Bayesian statistics” (e.g MCMC). Its size is arbitrary and yes, you can always increase it by throwing more computational power at the problem.
However by increasing the size of sample 2 you do NOT get “marginally closer to the real probabilities”, for that you need to increase the size of sample 1. Adding compute time gets you marginally closer only to the asymptotic estimate which in simple cases you can even calculate analytically.
Yes, there is an asymptotic limit where eventually you just approach the analytic estimator, and need more empirical/sensory data. There are almost always asymptotic limits, usually the “platonic” or “true” full-information probability.
But as I said, it was an analogy for beginners, not a complete description of how I expect a real AI system to work.
That’s true for something embodied as Human v1.0 or e.g. in a robot chassis, though the I/O bound even in that case might end up being greatly superhuman -- certainly the most intelligent humans can glean much more information from sensory inputs of basically fixed length than the least intelligent can, which suggests to me that the size of our training set is not our limiting factor. But it’s not necessarily true for something that can generate its own sensors and effectors, suitably generalized; depending on architecture, that could end up being CPU-bound or I/O-bound, and I don’t think we have enough understanding of the problem to say which.
The first thing that comes to mind, scaled up to its initial limits, might look like a botnet running image interpretation over the output of every poorly secured security camera in the world (and there are a lot of them). That would almost certainly be CPU-bound. But there are probably better options out there.
Yes, but now we’re going beyond the boundaries of the original comment which talked about how pure computing power (FLOPS + memory) can improve things. If you start building physical things (sensors and effectors), it’s an entirely different ball game.
Sensors and effectors in an AI context are not necessarily physical. They’re essentially the AI’s inputs and outputs, with a few constraints that are unimportant here; the terminology is a holdover from the days when everyone expected AI would be used primarily to run robots. We could be talking about web crawlers and Wikipedia edits, for example.
Fair point, though physical reality is still physical reality. If you need a breakthrough in building nanomachines, for example, you don’t get there by crawling the web really really fast.