On one interpretation of the question: if you’re hallucinating then you aren’t in fact seeing ghosts, you’re just imagining that you’re seeing ghosts. The question isn’t asking about those scenarios, it’s only asking what you should believe in the scenarios where you really do see ghosts.
My updated list after some more work yesterday is
96286, 9344, 107278, 68204, 905, 23565, 8415, 62718, 83512, 16423, 42742, 94304
which I see is the same as simon’s list, with very slight differences in the order
More on my process:
I initially modeled location just by a k nearest neighbors calculation, assuming that a site’s location value equals the average residual of its k nearest neighbors (with location transformed to Cartesian coordinates). That, along with linear regression predicting log(Performance), got me my first list of answers. I figured that list was probably good enough to pass the challenge: the sites’ predicted performance had a decent buffer over the required cutoff, the known sites with large predicted values did mostly have negative residuals but they were only about 1⁄3 the size of the buffer, there were some sites with large negative residuals but none among the sites with high predicted values and I probably even had a big enough buffer to withstand 1 of them sneaking in, and the nearest neighbors approach was likely to mainly err by giving overly middling values to sites near a sharp border (averaging across neighbors on both sides of the border) which would cause me to miss some good sites but not to include any bad sites. So it seemed fine to stop my work there.
Yesterday I went back and looked at the residuals and added some more handcrafted variables to my model to account for any visible patterns. The biggest was the sharp cutoff at Latitude +-36. I also changed my rescaling of Murphy’s Constant (because my previous attempt had negative residuals for low Murphy values), added a quadratic term to my rescaling of Local Value of Pi (because the dropoff from 3.15 isn’t linear), added a Shortitude cutoff at 45, and added a cos(Longitude-50) variable. Still kept the nearest neighbors calculation to account for any other location relevance (there is a little but much less now). That left me with 4 nines of correlation between predicted & actual performance, residuals near zero for the highest predicted sites in the training set, and this new list of sites. My previous lists of sites still seem good enough, but this one looks better.
Did a little robustness check, and I’m going to swap out 3 of these to make it:
96286, 23565, 68204, 905, 93762, 94408, 105880, 9344, 8415, 62718, 80395, 65607
To share some more:
I came across this puzzle via aphyer’s post, and got inspired to give it a try.
Here is the fit I was able to get on the existing sites (Performance vs. Predicted Performance). Some notes on it:
Seems good enough to run with. None of the highest predicted existing sites had a large negative residual, and the highest predicted new sites give some buffer.
Three observations I made along the way.
First (which is mostly redundant with what aphyer wound up sharing in his second post):
Almost every variable is predictive of Performance on its own, but none of the continuous variables have a straightforward linear relationship with Performance.
Modeling the effect of location could be tricky. e.g., Imagine on Earth if Australia and Mexico were especially good places for Performance, or on a checkerboard if Performance was higher on the black squares.
The ZPPG Performance variable has a skewed distribution which does not look like what you’d get if you were adding a bunch of variables, but does look like something you might get if you were multiplying several variables. And multiplication seems plausible for this scenario, e.g. perhaps such-and-such a disturbance halves Performance and this other factor cuts performance by a quarter.
My current choices (in order of preference) are
96286, 23565, 68204, 905, 93762, 94408, 105880, 8415, 94304, 42742, 92778, 62718
What’s “Time-Weighted Probability”? Is that just the average probability across the lifespan of the market? That’s not a quantity which is supposed to be calibrated.
e.g., Imagine a simple market on a coin flip, where forecasts of p(heads) are made at two times: t1 before the flip and t2 after the flip is observed. In half of the cases, the market forecast is 50% at t1 and 100% at t2, for an average of 75%; in those cases the market always resolves True. The other half: 50% at t1, 0% at t2, avg of 25%, market resolves False. The market is underconfident if you take this average, but the market is perfectly calibrated at any specific time.
Have you looked at other ways of setting up the prior to see if this result still holds? I’m worried that they way you’ve set up the prior is not very natural, especially if (as it looks at first glance) the Stable scenario forces p(Heads) = 0.5 and the other scenarios force p(Heads|Heads) + p(Heads|Tails) = 1. Seems weird to exclude “this coin is Headsy” from the hypothesis space while including “This coin is Switchy”.
Thinking about what seems most natural for setting up the prior: the simplest scenario is where flips are serially independent. You only need one number to characterize a hypothesis in that space, p(Heads). So you can have some prior on this hypothesis space (serial independent flips), and some prior on p(Heads) for hypotheses within this space. Presumably that prior should be centered at 0.5 and symmetric. There’s some choice about how spread out vs. concentrated to make it, but if it just puts all the probability mass at 0.5 that seems too simple.
The next simplest hypothesis space is where there is serial dependence that only depends on the most recent flip. You need two numbers to characterize a hypothesis in this space, which could be p(Heads|Heads) and p(Heads|Tails). I guess it’s simplest for those to be independent in your prior, so that (conditional on there being serial dependence), getting info about p(Heads|Heads) doesn’t tell you anything about p(Heads|Tails). In other words, you can simplify this two dimensional joint distribution to two independent one-dimensional distributions. (Though in real-world scenarios my guess is that these are positively correlated, e.g. if I learned that p(Prius|Jeep) was high that would probably increase my estimate of p(Prius|Prius), even assuming that there is some serial dependence.) For simplicity you could just give these the same prior distribution as p(Heads) in the serial independence case.
I think that’s a rich enough hypothesis space to run the numbers on. In this setup, Sticky hypotheses are those where p(Heads|Heads)>p(Heads|Tails), Switchy are the reverse, Headsy are where p(Heads|Heads)+p(Heads|Tails)>1, Tails are the reverse, and Stable are where p(Heads|Heads)=p(Heads|Tails) and get a bunch of extra weight in the prior because they’re the only ones in the serial independent space of hypotheses.
Try memorizing their birthdates (including year).
That might be different enough from what you’ve previously tried to memorize (month & day) to not get caught in the tangle that has developed.
My answer to “If AI wipes out humanity and colonizes the universe itself, the future will go about as well as if humanity had survived (or better)” is pretty much defined by how the question is interpreted. It could swing pretty wildly, but the obvious interpretation seems ~tautologically bad.
Agreed, I can imagine very different ways of getting a number for that, even given probability distributions for how good the future will be conditional on each of the two scenarios.
A stylized example: say that the AI-only future has a 99% chance of being mediocre and a 1% chance of being great, and the human future has a 60% chance of being mediocre and a 40% chance of being great. Does that give an answer of 1% or 60% or something else?
I’m also not entirely clear on what scenario I should be imagining for the “humanity had survived (or better)” case.
The time on a clock is pretty close to being a denotative statement.
Batesian mimicry is optimized to be misleading, “I”ll get to it tomorrow” is denotatively false, “I did not have sexual relations with that woman” is ambiguous as to its conscious intent to be denotatively false.
Structure Rebel, Content Purist: people who disagree with me are lying (unless they say “I think that”, “My view is”, or similar)
Structure Rebel, Content Neutral: people who disagree with me are lying even when they say “I think that”, “My view is”, or similar
Structure Rebel, Content Rebel: trying to unlock the front door with my back door key is a lie
How do you get a geocentric model with ellipses? Venus clearly does not go in an ellipse around the Earth. Did Riccioli just add a bunch of epicycles to the ellipses?
Googling… oh, it was a Tychonic model, where Venus orbits the sun in an ellipse (in agreement with Kepler), but the sun orbits the Earth.
Kepler’s ellipses wiped out the fully geocentric models where all the planets orbit around the Earth, because modeling their orbits around the Earth still required a bunch of epicycles and such, while modeling their orbits around the sun now involved a simple ellipse rather than just slightly fewer epicycles. But it didn’t straightforwardly, on its own wipe out the geoheliocentric/Tychonic models where most planets orbit the sun but the sun orbits the Earth.
Here is Yudkowsky (2008) Artificial Intelligence as a Positive andNegative Factor in Global Risk:
Friendly AI is not a module you can instantly invent at the exact moment when it is first needed, and then bolt on to an existing, polished design which is otherwise completely unchanged.The field of AI has techniques, such as neural networks and evolutionary programming, which have grown in power with the slow tweaking of decades. But neural networks are opaque—the user has no idea how the neural net is making its decisions—and cannot easily be rendered unopaque; the people who invented and polished neural networks were not thinking about the long-term problems of Friendly AI. Evolutionary programming (EP) is stochastic, and does not precisely preserve the optimization target in the generated code; EP gives you code that does what you ask, most of the time, under the tested circumstances, but the code may also do something else on the side. EP is a powerful, still maturing technique that is intrinsically unsuited to the demands of Friendly AI. Friendly AI, as I have proposed it, requires repeated cycles of recursive self-improvement that precisely preserve a stable optimization target.The most powerful current AI techniques, as they were developed and then polished and improved over time, have basic incompatibilities with the requirements of Friendly AI as I currently see them. The Y2K problem—which proved very expensive to fix, though not global-catastrophic—analogously arose from failing to foresee tomorrow’s design requirements. The nightmare scenario is that we find ourselves stuck with a catalog of mature, powerful, publicly available AI techniques which combine to yield non-Friendly AI, but which cannot be used to build Friendly AI without redoing the last three decades of AI work from scratch.
Also, chp 25 of HPMOR is from 2010 which is before CFAR.
It came up in the sequences, e.g. here, here, and here.
There are a couple errors in your table of interpretations. For “actual score = subjective expected”, the second half of the interpretation “prediction = 0.5 or prediction = true probability” got put on a new line in the “Comparison score” column instead of staying together in the “Interpretation” column, and similarly for the next one.
I posted a brainstorm of possible forecasting metrics a while back, which you might be interested in. It included one (which I called “Points Relative to Your Expectation”) that involved comparing a forecaster’s (Brier or other) score with the score that they’d expect to get based on their probability.
Sign error: “Tyler Cowen fires back that not only is this inevitable”
--> “not inevitable”
I don’t understand why it took so long to seriously considered the possibility that orbits are ellipses.
It seems that a circle is the simplest, most natural, most elegant hypothesis for the shape of an orbit, and an ellipse is the second-most simple/natural/elegant hypothesis. But instead of checking if an ellipse fit the data, everyone settled for ‘a lot like a circle, but you have to include a bunch of fudge factors to match the observational data’.
Apparently Kepler had a similar view. April 1605 is when he figured out that the orbit of Mars was an ellipse around the sun; two years earlier when he was already in the process of trying to figure out what sort of ovalish shape fit the data, he said that he had considered and rejected the ellipse hypothesis because if the answer was that simple then someone else would’ve figured it out already. This incorrect inadequacy analysis is from a July 1603 letter that Kepler wrote to David Fabricius: “I lack only a knowledge of the geometric generation of the oval or face-shaped curve. [...] If the figure were a perfect ellipse, then Archimedes and Apollonius would be enough.”
I could make some guesses about why it didn’t happen sooner (e.g. the fact that it happened right after Brahe collected his data suggests that poor data quality was a hindrance), but it feels pretty speculative. I wonder if there have been / could be more quantitative analyses of this question, e.g. do we have the data sets that the ancient Greeks used to fit their models, and can we see how well ellipses fit those data sets?
The suspects have since retreated to attempting to sue datacolada (the investigators).
It’s just one of the suspects (Gino) who is suing, right?