Author of meaningness.com, vividness.live, and other things.
MIT AI PhD, successful biotech entrepreneur, and other things.
Author of meaningness.com, vividness.live, and other things.
MIT AI PhD, successful biotech entrepreneur, and other things.
A collection of advice for graduate students I put together some time ago: http://www.cs.indiana.edu/mit.research.how.to.html
It was meant specifically for people at the MIT AI Lab, but much of it is applicable to other STEM fields.
A collection of collections of advice for graduate students! http://vlsicad.ucsd.edu/Research/Advice/
Can you recommend an explanation of the complete class theorem(s)? Preferably online. I’ve been googling pretty hard and I’ve turned up almost nothing. I’d like to understand what conditions they start from (suspecting that maybe the result is not quite as strong as “Bayes Rules!”). I’ve found only one paper, which basically said “what Wald proved is extremely difficult to understand, and probably not what you wanted.”
Thank you very much!
Hi!
I’ve been interested in how to think well since early childhood. When I was about ten, I read a book about cybernetics. (This was in the Oligocene, when “cybernetics” had only recently gone extinct.) It gave simple introductions to probability theory, game theory, information theory, boolean switching logic, control theory, and neural networks. This was definitely the coolest stuff ever.
I went on to MIT, and got an undergraduate degree in math, specializing in mathematical logic and the theory of computation—fields that grew out of philosophical investigations of rationality.
Then I did a PhD at the MIT AI Lab, continuing my interest in what thinking is. My work there seems to have been turned into a surrealistic novel by Ken Wilber, a woo-ish pop philosopher. Along the way, I studied a variety of other fields that give diverse insights into thinking, ranging from developmental psychology to ethnomethodology to existential phenomenology.
I became aware of LW gradually over the past few years, mainly through mentions by people I follow on Twitter. As a lurker, there’s a lot about the LW community I’ve loved. On the other hand, I think some fundamental, generally-accepted ideas here are limited and misleading. I began considering writing about that recently, and posted some musings about whether and how it might be useful to address these misconceptions. (This was perhaps ruder than it ought to have been.) It prompted a reply post from Yvain, and much discussion on both his site and mine.
I followed that up with a more constructive post on aspects of how to think well that LW generally overlooks. In comments on that post, several frequent LW contributors encouraged me to re-post that material here. I may yet do that!
For now, though, I’ve started a sequence of LW articles on the difference between uncertainty and probability. Missing this distinction seems to underlie many of the ways I find LW thinking limited. Currently my outline for the sequence has seven articles, covering technical explanations of this difference, with various illustrations; the consequences of overlooking the distinction; and ways of dealing with uncertainty when probability theory is unhelpful.
(Kaj Sotala has suggested that I ask for upvotes on this self-introduction, so I can accumulate enough karma to move the articles from Discussion to Main. I wouldn’t have thought to ask that myself, but he seems to know what he’s doing here! :-)
O&BTW, I also write about contemporary trends in Buddhism, on several web sites, including a serial, philosophical, tantric Buddhist vampire romance novel.
Jeremy, thank you for this. To be clear, I wasn’t suggesting that meta-probability is the solution. It’s a solution. I chose it because I plan to use this framework in later articles, where it will (I hope) be particularly illuminating.
I would take issue with the first section of this article in which you establish single probability (expected utility) calculations as insufficient for the problem.
I don’t think it’s correct to equate probability with expected utility, as you seem to do here. The probability of a payout is the same in the two situations. The point of this example is that the probability of a particular event does not determine the optimal strategy. Because utility is dependent on your strategy, that also differs.
This problem easily succumbs to standard expected value calculations if all actions are considered.
Yes, absolutely! I chose a particularly simple problem, in which the correct decision-theoretic analysis is obvious, in order to show that probability does not always determine optimal strategy. In this case, the optimal strategies are clear (except for the exact stopping condition), and clearly different, even though the probabilities are the same.
I’m using this as an introductory wedge example. I’ve opened a Pandora’s Box: probability by itself is not a fully adequate account of rationality. Many odd things will leap and creep out of that box so long as we leave it open.
Luke, thank you for these pointers! I’ve read some of them, and have the rest open in tabs to read soon.
Thanks, Jonathan, yes, that’s how I understand it.
Jaynes’ discussion motivates A_p as an efficiency hack that allows you to save memory by forgetting some details. That’s cool, although not the point I’m trying to make here.
Jeremy, I think the apparent disagreement here is due to unclarity about what the point of my argument was. The point was not that this situation can’t be analyzed with decision theory; it certainly can, and I did so. The point is that different decisions have to be made in two situations where the probabilities are the same.
Your discussion seems to equate “probability” with “utility”, and the whole point of the example is that, in this case, they are not the same.
It may be helpful to read some related posts (linked by lukeprog in a comment on this post): Estimate stability, and Model Stability in Intervention Assessment, which comments on Why We Can’t Take Expected Value Estimates Literally (Even When They’re Unbiased). The first of those motivates the A_p (meta-probability) approach, the second uses it, and the third explains intuitively why it’s important in practice.
I’m sure you know more about this than I do! Based on a quick Wiki check, I suspect that formally the A_p are one type of hyperprior, but not all hyperpriors are A_p (a/k/a metaprobabilities).
Hyperparameters are used in Bayesian sensitivity analysis, a/k/a “Robust Bayesian Analysis”, which I recently accidentally reinvented here. I might write more about that later in this sequence.
I don’t think is demonstrated at all by this example.
Yes, I see your point (although I don’t altogether agree). But, again, what I’m doing here is setting up analytical apparatus that will be helpful for more difficult cases later.
In the mean time, the LW posts I pointed to here may motivate more strongly the claim that probability alone is an insufficient guide to action.
Decisions are made on the basis of expected value, not probability.
Yes, that’s the point here!
your analysis of the first bet ignores the value of the information gained from it in executing your options for further play thereafter.
By “the first bet” I take it that you mean “your first opportunity to put a coin in a green box” (rather than meaning “brown box”).
My analysis of that was “you should put some coins in the box”, exactly because of the information gain.
This statement indicates a lack of understanding of Jaynes, or at least an adherence to his foundations.
This post was based closely on the Chapter 18 of Jaynes’ book, where he writes:
Suppose you have a penny and you are allowed to examine it carefully, and convince yourself that it is an honest coin; i.e. accurately round, with head and tail, and a center of gravity where it ought to be. Then you’re asked to assign a probability that this coin will come up heads on the first toss. I’m sure you’ll say 1⁄2. Now, suppose you are asked to assign a probability to the proposition that there was once life on Mars. Well, I don’t know what your opinion is there, but on the basis of all the things that I have read on the subject, I would again say about 1⁄2 for the probability. But, even though I have assigned the same ‘external’ probabilities to them, I have a very different ‘internal’ state of knowledge about those propositions.
Do you think he’s saying something different from me here?
Yup, it’s definitely wrong! I was hoping no one would notice. I thought it would be a distraction to explain why the two are different (if that’s not obvious), and also I didn’t want to figure out exactly what the right math was to feed to my plotting package for this case. (Is the correct form of the curve for the p=0 case obvious to you? It wasn’t obvious to me, but this isn’t my area of expertise...)
Thanks! Fixed.
Glad you liked it!
I also get “stop after two losses,” although my numbers come out slightly differently. However, I suck at this sort of problem, so it’s quite likely I’ve got it wrong.
My temptation would be to solve it numerically (by brute force), i.e. code up a simulation and run it a million times and get the answer by seeing which strategy does best. Often that’s the right approach. However, sometimes you can’t simulate, and an analytical (exact, a priori) answer is better.
I think you are right about the sportsball case! I’ve updated my meta-meta-probability curve accordingly :-)
Can you think of a better example, in which the curve ought to be dead flat?
Jaynes uses “the probability that there was once life on Mars” in his discussion of this. I’m not sure that’s such a great example either.
So, let me try again to explain why I think this is missing the point… I wrote “a single probability value fails to capture everything you know about an uncertain event.” Maybe “simple” would have been better than “single”?
The point is that you can’t solve this problem without somehow reasoning about probabilities of probabilities. You can solve it by reasoning about the expected value of different strategies. (I said so in the OP; I constructed the example to make this the obviously correct approach.) But those strategies contain reasoning about probabilities within them. So the “outer” probabilities (about strategies) are meta-probabilistic.
[Added:] Evidently, my OP was unclear and failed to communicate, since several people missed the same point in the same way. I’ll think about how to revise it to make it clearer.
We could also try to summarize some features of such epistemic states by talking about the instability of estimates—the degree to which they are easily updated by knowledge of other events
Yes, this is Jaynes’ A_p approach.
this will be a derived feature of the probability distribution, rather than an ontologically extra feature of probability.
I’m not sure I follow this. There is no prior distribution for the per-coin payout probabilities that can accurately reflect all our knowledge.
I reject that this is a good reason for probability theorists to panic.
Yes, it’s clear from comments that my OP was somewhat misleading as to its purpose. Overall, the sequence intends to discuss cases of uncertainty in which probability theory is the wrong tool for the job, and what to do instead.
However, this particular article intended only to introduce the idea that one’s confidence in a probability estimate is independent from that estimate, and to develop the A_p (meta-probability) approach to expressing that confidence.
Are you claiming there’s no prior distribution over sequences which reflects our knowledge?
No. Well, not so long as we’re allowed to take our own actions into account!
I want to emphasize—since many commenters seem to have mistaken me on this—that there’s an obvious, correct solution to this problem (which I made explicit in the OP). I deliberately made the problem as simple as possible in order to present the A_p framework clearly.
Are we talking about the Laplace vs. fair coins?
Not sure what you are asking here, sorry...
Regarding the development of agreeableness/empathy: there are meditation techniques specifically intended to do this. (They are variously called “Metta”, “Lojong”, “Tonglen”, or (yuck) “loving kindness meditation”; all of which are pretty similar.) These originate in Mahayana Buddhism, but don’t have any specifically religious content. They are often taught in conjunction with mindfulness meditation.
I don’t know whether there have been any serious studies on these methods, but anecdotally they are highly effective. They seem not only to develop empathy, but also personal happiness (although that is not a stated goal). Generally, the serious studies that have been done on different meditation techniques have found that they work as advertised...