This sounds like a probability search problem in which you don’t know for sure there exists anything to find—the hope function.
I worked through this in #lesswrong with nialo. It’s interesting to work with various versions of this. For example, suppose you had a uniform distribution for AI’s creation over 2000-2100, and you believe its creation 90% possible. It is of course now 2011, so how much do you believe it is possible now given its failure to appear between 2000 and now? We could write that in Haskell as let fai x = (100-x) / ((100 / 0.9) - x) in fai 11 which evaluates to ~0.889 - so one’s faith hasn’t been much damaged.
One of the interesting things is how slowly one’s credence in AI being possible declines. If you run the function fai 50*, it’s 81%. fai 90** = 47%! But then by fai 98 it has suddenly shrunk to 15% and so on for fai 99 = 8%, and fai 100 is of course 0% (since now one has disproven the possibility).
* no AI by 2050
** no AI by 2090, etc.
EDIT: Part of the interestingness is that one of the common criticisms of AI is ‘look at them, they were wrong about AI being possible in 19xx, how sad and pathetic that they still think it’s possible!’ The hope function shows that unless one is highly confident about AI showing up in the early part of a time range, the failure of AI to show up ought to damage one’s belief only a little bit.
“What I found most interesting was, the study provides evidence that people seem to reason as though probabilities were physical properties of matter. In the example with the desk with the eight drawers and an 80% chance a letter is in the desk, many people reasoned as though “80% chance-of-letter” was a fundamental property of the furniture, up there with properties like weight, mass, and density.
Many reasoned that the odds the desk has the letter, stay 80% throughout the fruitless search. Thus, they reasoned, it would still be 80%, even if they searched seven drawers and found no letter. And these were people with some education about probability! One problem is people were tending to overcompensate to avoid falling into the Gambler’s Fallacy. They were educated, well-learned people, and they knew that the probability of a fair coin falling heads remains 50%, no matter how many times in a row heads have already been rolled. They seemed to generalize this to the letter search. There’s an important difference, though: the coin flips are independent of each other. The drawer searches are not.
In a followup study, when the modified questions were posed, with two extra “locked” drawers and a 100% initial probability of a letter, miraculously the respondents’ answers showed dramatic improvement. Even though, formally, the exercises were isomorphic.”
For a non-uniform distribution we can use the similar formula (1.0 - p(before 2011)) / (1.0/0.9 - p(before 2011)) which is analogous to adding a extra blob of (uncounted) probability density (such that if the AI is “actually built” anywhere within the distribution including the uncounted bit, the prior probability (0.9) is the ratio (counted) / (counted + uncounted)), and then cutting off the part where we know the AI to have not been built.
For a normal(mu = 2050, sigma=10) distribution, in Haskell this is let ai year = (let p = cumulative (normalDistr 2050 (10^2)) year in (1.0 - p) / (1.0/0.9 - p))¹. Evaluating on a few different years:
P(AI|not by 2011) = 0.899996
P(AI|not by 2030) = 0.8979
P(AI|not by 2050) = 0.8181...
P(AI|not by 2070) = 0.16995
P(AI|not by 2080) = 0.012
P(AI|not by 2099) = 0.00028
This drops off far faster than the uniform case, once 2050 is reached. We can also use this survey as an interesting source for a distribution. The median estimate for P=0.5 is 2050, which gives us the same mu, and the median for P=0.1 was 2028, which fits with sigma ~ 17 years². We also have P=0.9 by 2150, suggesting our prior of 0.9 is in the ballpark. Plugging the same years into the new distribution:
P(AI|not by 2011) = 0.899
P(AI|not by 2030) = 0.888
P(AI|not by 2050) = 0.8181...
P(AI|not by 2070) = 0.52
P(AI|not by 2080) = 0.26
P(AI|not by 2099) = 0.017
Even by 2030 our confidence will have changed little.
²Technically, the survey seems to have asked about unconditional probabilities, not conditional on that AI is possible, whereas the latter is what we want. We may want then to actually fit a normal distribution so that cdf(2028) = 0.1/0.9 and cdf(2050) = 0.5/0.9, which would be a bit harder (we can’t just use 2050 as mu).
This drops off far faster than the uniform case, once 2050 is reached.
The intuitive explanation for this behavior where the normal distribution drops off faster is because it makes such strong predictions about the region around 2050 and once you’ve reached 2070 with no AI, you’ve ‘wasted’ most of your possible drawers, to continue the original blog post’s metaphor.
To get a visual analogue of the probability mass, you could map the normal curve onto a uniform distribution, something like ‘if we imagine each year at the peak corresponds to 30 years in a uniform version, then it’s like we were looking at the period 1500-2100AD, so 2070 is very late in the game indeed!’ To give a crude ASCII diagram, the mapped normal curve would look like this where every space/column is 1 equal chance to make AI:
Cool! Would it be easy for you to repeat this replacing the normal distribution with an exponential distribution? I think that’s a more natural way to model “waiting for something”.
You’re right, the probability should drop off in a kind of exponential curve since an AI only “gets created at year X” if it hasn’t been made before X. I did some thinking, and I think I can do one better. We can model the creation of the AI as a succession of subsequent “technological breakthroughs” for the most part, ie. unpredictable in advance insights about algorithms or processors or whatever that allow the project to “proceed to the next step”.
Each step can have an exponential distribution for when it will be completed, all (for simplicity) with the same average, set so that the average time for the final distribution will be 50 years (from the year 2000). The final distribution is then P(totaltime = x) = P(sum{time for each step} = x) which is just the repeated convolution of the distribution for each step. The density distribution turns out to be fairly funky:
!}),
where n is the number of steps involved and a is the parameter to the exponential distributions, which for our purposes is n/50 so that the mean of the pdf is 50 years. For n=1 this is of course just the exponential distribution. For n=5 we get a distribution something like this. The cumulative distribution function is actually a bit nicer in a way, just:
}{(n-1)!})
where γ(s, x) is the lower incomplete gamma function, or Γ(s) - Γ(s, x), which is normalized by (n-1)!. Ok, let’s plug in some numbers. The relevant Haskell expression is let ai n x = (let a = (n/50); p = incompleteGamma n (a*x) in (1.0 - p)/(1.0/0.9 - p))¹ where x is years since 2000 without AI. Then for the simple exponential case:
P(AI|not by 2011, n=1) = 0.88
P(AI|not by 2030, n=1) = 0.83
P(AI|not by 2050, n=1) = 0.77
P(AI|not by 2070, n=1) = 0.69
P(AI|not by 2080, n=1) = 0.65
P(AI|not by 2099, n=1) = 0.55
We seem to lose confidence at a somewhat constant gradual rate. By 2150 our probability is still 0.30 though, and it’s only by 2300 that it drops to ~2%. Perhaps I need to just cut off the distribution by 2100. Anyway, for n=5 we have more conclusive results:
P(AI|not by 2011, n=5) = 0.8995
P(AI|not by 2030, n=5) = 0.88
P(AI|not by 2050, n=5) = 0.80
P(AI|not by 2070, n=5) = 0.61
P(AI|not by 2080, n=5) = 0.47
P(AI|not by 2099, n=5) = 0.22
P(AI|not by 2150, n=5) = 0.0077
So we don’t change our confidence much until 2050, then quickly lose confidence, as that area contains “most of the drawers”, metaphorically. We will have pretty much disproved strong AI by 2150.
¹ Using the Statistics package again, specifically Statistics.Math.. Note that incompleteGamma is already normalized so we don’t need to divide by (n-1)!.
This isn’t even related to the law of large numbers, which says that if you flip many coins you expect to get close to half heads and half tails. This is as opposed to flipping 1 coin, where you expect to always get either 100% heads or 100% tails.
I personally expected that P(AI) would drop-off roughly linearly as n increased, so this certainly seems counter-intuitive to me.
This sounds like a probability search problem in which you don’t know for sure there exists anything to find—the hope function.
I worked through this in
#lesswrong
with nialo. It’s interesting to work with various versions of this. For example, suppose you had a uniform distribution for AI’s creation over 2000-2100, and you believe its creation 90% possible. It is of course now 2011, so how much do you believe it is possible now given its failure to appear between 2000 and now? We could write that in Haskell aslet fai x = (100-x) / ((100 / 0.9) - x) in fai 11
which evaluates to ~0.889 - so one’s faith hasn’t been much damaged.One of the interesting things is how slowly one’s credence in AI being possible declines. If you run the function
fai 50
*, it’s 81%.fai 90
** = 47%! But then byfai 98
it has suddenly shrunk to 15% and so on forfai 99
= 8%, andfai 100
is of course 0% (since now one has disproven the possibility).* no AI by 2050
** no AI by 2090, etc.
EDIT: Part of the interestingness is that one of the common criticisms of AI is ‘look at them, they were wrong about AI being possible in 19xx, how sad and pathetic that they still think it’s possible!’ The hope function shows that unless one is highly confident about AI showing up in the early part of a time range, the failure of AI to show up ought to damage one’s belief only a little bit.
That blog post is also interesting from a mind projection fallacy viewpoint:
For a non-uniform distribution we can use the similar formula
(1.0 - p(before 2011)) / (1.0/0.9 - p(before 2011))
which is analogous to adding a extra blob of (uncounted) probability density (such that if the AI is “actually built” anywhere within the distribution including the uncounted bit, the prior probability (0.9) is the ratio(counted) / (counted + uncounted)
), and then cutting off the part where we know the AI to have not been built.For a normal(mu = 2050, sigma=10) distribution, in Haskell this is
let ai year = (let p = cumulative (normalDistr 2050 (10^2)) year in (1.0 - p) / (1.0/0.9 - p))
¹. Evaluating on a few different years:P(AI|not by 2011) = 0.899996
P(AI|not by 2030) = 0.8979
P(AI|not by 2050) = 0.8181...
P(AI|not by 2070) = 0.16995
P(AI|not by 2080) = 0.012
P(AI|not by 2099) = 0.00028
This drops off far faster than the uniform case, once 2050 is reached. We can also use this survey as an interesting source for a distribution. The median estimate for P=0.5 is 2050, which gives us the same mu, and the median for P=0.1 was 2028, which fits with sigma ~ 17 years². We also have P=0.9 by 2150, suggesting our prior of 0.9 is in the ballpark. Plugging the same years into the new distribution:
P(AI|not by 2011) = 0.899
P(AI|not by 2030) = 0.888
P(AI|not by 2050) = 0.8181...
P(AI|not by 2070) = 0.52
P(AI|not by 2080) = 0.26
P(AI|not by 2099) = 0.017
Even by 2030 our confidence will have changed little.
¹Using Statistics.Distribution.Normal from Hackage.
²Technically, the survey seems to have asked about unconditional probabilities, not conditional on that AI is possible, whereas the latter is what we want. We may want then to actually fit a normal distribution so that cdf(2028) = 0.1/0.9 and cdf(2050) = 0.5/0.9, which would be a bit harder (we can’t just use 2050 as mu).
The intuitive explanation for this behavior where the normal distribution drops off faster is because it makes such strong predictions about the region around 2050 and once you’ve reached 2070 with no AI, you’ve ‘wasted’ most of your possible drawers, to continue the original blog post’s metaphor.
To get a visual analogue of the probability mass, you could map the normal curve onto a uniform distribution, something like ‘if we imagine each year at the peak corresponds to 30 years in a uniform version, then it’s like we were looking at the period 1500-2100AD, so 2070 is very late in the game indeed!’ To give a crude ASCII diagram, the mapped normal curve would look like this where every space/column is 1 equal chance to make AI:
Cool! Would it be easy for you to repeat this replacing the normal distribution with an exponential distribution? I think that’s a more natural way to model “waiting for something”.
You’re right, the probability should drop off in a kind of exponential curve since an AI only “gets created at year X” if it hasn’t been made before X. I did some thinking, and I think I can do one better. We can model the creation of the AI as a succession of subsequent “technological breakthroughs” for the most part, ie. unpredictable in advance insights about algorithms or processors or whatever that allow the project to “proceed to the next step”.
Each step can have an exponential distribution for when it will be completed, all (for simplicity) with the same average, set so that the average time for the final distribution will be 50 years (from the year 2000). The final distribution is then
!}),P(totaltime = x) = P(sum{time for each step} = x)
which is just the repeated convolution of the distribution for each step. The density distribution turns out to be fairly funky:where n is the number of steps involved and a is the parameter to the exponential distributions, which for our purposes is n/50 so that the mean of the pdf is 50 years. For n=1 this is of course just the exponential distribution. For n=5 we get a distribution something like this. The cumulative distribution function is actually a bit nicer in a way, just:
}{(n-1)!})where
γ(s, x)
is the lower incomplete gamma function, orΓ(s) - Γ(s, x)
, which is normalized by(n-1)!
. Ok, let’s plug in some numbers. The relevant Haskell expression islet ai n x = (let a = (n/50); p = incompleteGamma n (a*x) in (1.0 - p)/(1.0/0.9 - p))
¹ where x is years since 2000 without AI. Then for the simple exponential case:P(AI|not by 2011, n=1) = 0.88
P(AI|not by 2030, n=1) = 0.83
P(AI|not by 2050, n=1) = 0.77
P(AI|not by 2070, n=1) = 0.69
P(AI|not by 2080, n=1) = 0.65
P(AI|not by 2099, n=1) = 0.55
We seem to lose confidence at a somewhat constant gradual rate. By 2150 our probability is still 0.30 though, and it’s only by 2300 that it drops to ~2%. Perhaps I need to just cut off the distribution by 2100. Anyway, for n=5 we have more conclusive results:
P(AI|not by 2011, n=5) = 0.8995
P(AI|not by 2030, n=5) = 0.88
P(AI|not by 2050, n=5) = 0.80
P(AI|not by 2070, n=5) = 0.61
P(AI|not by 2080, n=5) = 0.47
P(AI|not by 2099, n=5) = 0.22
P(AI|not by 2150, n=5) = 0.0077
So we don’t change our confidence much until 2050, then quickly lose confidence, as that area contains “most of the drawers”, metaphorically. We will have pretty much disproved strong AI by 2150.
¹ Using the Statistics package again, specifically Statistics.Math.. Note that
incompleteGamma
is already normalized so we don’t need to divide by (n-1)!.This is great. The fact that P(AI) is dropping off faster for large n than for small n is a little counterintuitive, right?
Isn’t it just the law of large numbers?
This isn’t even related to the law of large numbers, which says that if you flip many coins you expect to get close to half heads and half tails. This is as opposed to flipping 1 coin, where you expect to always get either 100% heads or 100% tails.
I personally expected that P(AI) would drop-off roughly linearly as n increased, so this certainly seems counter-intuitive to me.
Incidentally, I’ve tried to apply the hope function to my recent essay on Folding@home: http://www.gwern.net/Charity%20is%20not%20about%20helping#updating-on-evidence