For a non-uniform distribution we can use the similar formula (1.0 - p(before 2011)) / (1.0/0.9 - p(before 2011)) which is analogous to adding a extra blob of (uncounted) probability density (such that if the AI is “actually built” anywhere within the distribution including the uncounted bit, the prior probability (0.9) is the ratio (counted) / (counted + uncounted)), and then cutting off the part where we know the AI to have not been built.
For a normal(mu = 2050, sigma=10) distribution, in Haskell this is let ai year = (let p = cumulative (normalDistr 2050 (10^2)) year in (1.0 - p) / (1.0/0.9 - p))¹. Evaluating on a few different years:
P(AI|not by 2011) = 0.899996
P(AI|not by 2030) = 0.8979
P(AI|not by 2050) = 0.8181...
P(AI|not by 2070) = 0.16995
P(AI|not by 2080) = 0.012
P(AI|not by 2099) = 0.00028
This drops off far faster than the uniform case, once 2050 is reached. We can also use this survey as an interesting source for a distribution. The median estimate for P=0.5 is 2050, which gives us the same mu, and the median for P=0.1 was 2028, which fits with sigma ~ 17 years². We also have P=0.9 by 2150, suggesting our prior of 0.9 is in the ballpark. Plugging the same years into the new distribution:
P(AI|not by 2011) = 0.899
P(AI|not by 2030) = 0.888
P(AI|not by 2050) = 0.8181...
P(AI|not by 2070) = 0.52
P(AI|not by 2080) = 0.26
P(AI|not by 2099) = 0.017
Even by 2030 our confidence will have changed little.
²Technically, the survey seems to have asked about unconditional probabilities, not conditional on that AI is possible, whereas the latter is what we want. We may want then to actually fit a normal distribution so that cdf(2028) = 0.1/0.9 and cdf(2050) = 0.5/0.9, which would be a bit harder (we can’t just use 2050 as mu).
This drops off far faster than the uniform case, once 2050 is reached.
The intuitive explanation for this behavior where the normal distribution drops off faster is because it makes such strong predictions about the region around 2050 and once you’ve reached 2070 with no AI, you’ve ‘wasted’ most of your possible drawers, to continue the original blog post’s metaphor.
To get a visual analogue of the probability mass, you could map the normal curve onto a uniform distribution, something like ‘if we imagine each year at the peak corresponds to 30 years in a uniform version, then it’s like we were looking at the period 1500-2100AD, so 2070 is very late in the game indeed!’ To give a crude ASCII diagram, the mapped normal curve would look like this where every space/column is 1 equal chance to make AI:
Cool! Would it be easy for you to repeat this replacing the normal distribution with an exponential distribution? I think that’s a more natural way to model “waiting for something”.
You’re right, the probability should drop off in a kind of exponential curve since an AI only “gets created at year X” if it hasn’t been made before X. I did some thinking, and I think I can do one better. We can model the creation of the AI as a succession of subsequent “technological breakthroughs” for the most part, ie. unpredictable in advance insights about algorithms or processors or whatever that allow the project to “proceed to the next step”.
Each step can have an exponential distribution for when it will be completed, all (for simplicity) with the same average, set so that the average time for the final distribution will be 50 years (from the year 2000). The final distribution is then P(totaltime = x) = P(sum{time for each step} = x) which is just the repeated convolution of the distribution for each step. The density distribution turns out to be fairly funky:
!}),
where n is the number of steps involved and a is the parameter to the exponential distributions, which for our purposes is n/50 so that the mean of the pdf is 50 years. For n=1 this is of course just the exponential distribution. For n=5 we get a distribution something like this. The cumulative distribution function is actually a bit nicer in a way, just:
}{(n-1)!})
where γ(s, x) is the lower incomplete gamma function, or Γ(s) - Γ(s, x), which is normalized by (n-1)!. Ok, let’s plug in some numbers. The relevant Haskell expression is let ai n x = (let a = (n/50); p = incompleteGamma n (a*x) in (1.0 - p)/(1.0/0.9 - p))¹ where x is years since 2000 without AI. Then for the simple exponential case:
P(AI|not by 2011, n=1) = 0.88
P(AI|not by 2030, n=1) = 0.83
P(AI|not by 2050, n=1) = 0.77
P(AI|not by 2070, n=1) = 0.69
P(AI|not by 2080, n=1) = 0.65
P(AI|not by 2099, n=1) = 0.55
We seem to lose confidence at a somewhat constant gradual rate. By 2150 our probability is still 0.30 though, and it’s only by 2300 that it drops to ~2%. Perhaps I need to just cut off the distribution by 2100. Anyway, for n=5 we have more conclusive results:
P(AI|not by 2011, n=5) = 0.8995
P(AI|not by 2030, n=5) = 0.88
P(AI|not by 2050, n=5) = 0.80
P(AI|not by 2070, n=5) = 0.61
P(AI|not by 2080, n=5) = 0.47
P(AI|not by 2099, n=5) = 0.22
P(AI|not by 2150, n=5) = 0.0077
So we don’t change our confidence much until 2050, then quickly lose confidence, as that area contains “most of the drawers”, metaphorically. We will have pretty much disproved strong AI by 2150.
¹ Using the Statistics package again, specifically Statistics.Math.. Note that incompleteGamma is already normalized so we don’t need to divide by (n-1)!.
This isn’t even related to the law of large numbers, which says that if you flip many coins you expect to get close to half heads and half tails. This is as opposed to flipping 1 coin, where you expect to always get either 100% heads or 100% tails.
I personally expected that P(AI) would drop-off roughly linearly as n increased, so this certainly seems counter-intuitive to me.
For a non-uniform distribution we can use the similar formula
(1.0 - p(before 2011)) / (1.0/0.9 - p(before 2011))
which is analogous to adding a extra blob of (uncounted) probability density (such that if the AI is “actually built” anywhere within the distribution including the uncounted bit, the prior probability (0.9) is the ratio(counted) / (counted + uncounted)
), and then cutting off the part where we know the AI to have not been built.For a normal(mu = 2050, sigma=10) distribution, in Haskell this is
let ai year = (let p = cumulative (normalDistr 2050 (10^2)) year in (1.0 - p) / (1.0/0.9 - p))
¹. Evaluating on a few different years:P(AI|not by 2011) = 0.899996
P(AI|not by 2030) = 0.8979
P(AI|not by 2050) = 0.8181...
P(AI|not by 2070) = 0.16995
P(AI|not by 2080) = 0.012
P(AI|not by 2099) = 0.00028
This drops off far faster than the uniform case, once 2050 is reached. We can also use this survey as an interesting source for a distribution. The median estimate for P=0.5 is 2050, which gives us the same mu, and the median for P=0.1 was 2028, which fits with sigma ~ 17 years². We also have P=0.9 by 2150, suggesting our prior of 0.9 is in the ballpark. Plugging the same years into the new distribution:
P(AI|not by 2011) = 0.899
P(AI|not by 2030) = 0.888
P(AI|not by 2050) = 0.8181...
P(AI|not by 2070) = 0.52
P(AI|not by 2080) = 0.26
P(AI|not by 2099) = 0.017
Even by 2030 our confidence will have changed little.
¹Using Statistics.Distribution.Normal from Hackage.
²Technically, the survey seems to have asked about unconditional probabilities, not conditional on that AI is possible, whereas the latter is what we want. We may want then to actually fit a normal distribution so that cdf(2028) = 0.1/0.9 and cdf(2050) = 0.5/0.9, which would be a bit harder (we can’t just use 2050 as mu).
The intuitive explanation for this behavior where the normal distribution drops off faster is because it makes such strong predictions about the region around 2050 and once you’ve reached 2070 with no AI, you’ve ‘wasted’ most of your possible drawers, to continue the original blog post’s metaphor.
To get a visual analogue of the probability mass, you could map the normal curve onto a uniform distribution, something like ‘if we imagine each year at the peak corresponds to 30 years in a uniform version, then it’s like we were looking at the period 1500-2100AD, so 2070 is very late in the game indeed!’ To give a crude ASCII diagram, the mapped normal curve would look like this where every space/column is 1 equal chance to make AI:
Cool! Would it be easy for you to repeat this replacing the normal distribution with an exponential distribution? I think that’s a more natural way to model “waiting for something”.
You’re right, the probability should drop off in a kind of exponential curve since an AI only “gets created at year X” if it hasn’t been made before X. I did some thinking, and I think I can do one better. We can model the creation of the AI as a succession of subsequent “technological breakthroughs” for the most part, ie. unpredictable in advance insights about algorithms or processors or whatever that allow the project to “proceed to the next step”.
Each step can have an exponential distribution for when it will be completed, all (for simplicity) with the same average, set so that the average time for the final distribution will be 50 years (from the year 2000). The final distribution is then
!}),P(totaltime = x) = P(sum{time for each step} = x)
which is just the repeated convolution of the distribution for each step. The density distribution turns out to be fairly funky:where n is the number of steps involved and a is the parameter to the exponential distributions, which for our purposes is n/50 so that the mean of the pdf is 50 years. For n=1 this is of course just the exponential distribution. For n=5 we get a distribution something like this. The cumulative distribution function is actually a bit nicer in a way, just:
}{(n-1)!})where
γ(s, x)
is the lower incomplete gamma function, orΓ(s) - Γ(s, x)
, which is normalized by(n-1)!
. Ok, let’s plug in some numbers. The relevant Haskell expression islet ai n x = (let a = (n/50); p = incompleteGamma n (a*x) in (1.0 - p)/(1.0/0.9 - p))
¹ where x is years since 2000 without AI. Then for the simple exponential case:P(AI|not by 2011, n=1) = 0.88
P(AI|not by 2030, n=1) = 0.83
P(AI|not by 2050, n=1) = 0.77
P(AI|not by 2070, n=1) = 0.69
P(AI|not by 2080, n=1) = 0.65
P(AI|not by 2099, n=1) = 0.55
We seem to lose confidence at a somewhat constant gradual rate. By 2150 our probability is still 0.30 though, and it’s only by 2300 that it drops to ~2%. Perhaps I need to just cut off the distribution by 2100. Anyway, for n=5 we have more conclusive results:
P(AI|not by 2011, n=5) = 0.8995
P(AI|not by 2030, n=5) = 0.88
P(AI|not by 2050, n=5) = 0.80
P(AI|not by 2070, n=5) = 0.61
P(AI|not by 2080, n=5) = 0.47
P(AI|not by 2099, n=5) = 0.22
P(AI|not by 2150, n=5) = 0.0077
So we don’t change our confidence much until 2050, then quickly lose confidence, as that area contains “most of the drawers”, metaphorically. We will have pretty much disproved strong AI by 2150.
¹ Using the Statistics package again, specifically Statistics.Math.. Note that
incompleteGamma
is already normalized so we don’t need to divide by (n-1)!.This is great. The fact that P(AI) is dropping off faster for large n than for small n is a little counterintuitive, right?
Isn’t it just the law of large numbers?
This isn’t even related to the law of large numbers, which says that if you flip many coins you expect to get close to half heads and half tails. This is as opposed to flipping 1 coin, where you expect to always get either 100% heads or 100% tails.
I personally expected that P(AI) would drop-off roughly linearly as n increased, so this certainly seems counter-intuitive to me.