It’s important to note that accuracy and calibration are two different things. I’m mentioning this because the OP asks for calibration metrics, but several answers so far give accuracy metrics. Any proper scoring rule is a measure of accuracy as opposed to calibration.
It is possible to be very well-calibrated but very inaccurate; for example, you might know that it is going to be Monday 1/7th of the time, so you give a probability of 1/7th. Everyone else just knows what day it is. On a calibration graph, you would be perfectly lined up; when you say 1/7th, the thing happens 1/7th of the time.
It is also possible to have high accuracy and poor calibration. Perhaps you can guess coin flips when no one else can, but you are wary of your precognitive powers, which makes you underconfident. So, you always place 60% probability on the event that actually happens (heads or tails). Your calibration graph is far out of line, but your accuracy is higher than anyone else.
In terms of improving rationality, the interesting thing about calibration is that (as in the precog example) if you know you’re poorly calibrated, you can boost your accuracy simply by improving your calibration. In some sense it is a free improvement: you don’t need to know anything more about the domain; you get more accurate just by knowing more about yourself (by seeing a calibration chart and adjusting).
However, if you just try to be more calibrated without any concern for accuracy, you could be like the person who says 1/7th. So, just aiming to do well on a score of calibration is not a good idea. This could be part of the reason why calibration charts are presented instead of calibration scores. (Another reason being that calibration charts help you know how to adjust to increase calibration.)
That being said, a decomposition of a proper scoring rule into components including a measure of calibration, like Dark Denego gives, seems like the way to go.
I guess, philosophically, I worry that giving the nodes special types like that pushes people toward thinking about agents as not-embedded-in-the-world, thinking things like “we need to extend Bayes nets to represent actions and utilities, because those are not normal variable nodes”. Not that memoryless cartesian environments are any better in that respect.
Hrm. I realize that the post would be comprehensible to a much wider audience with a glossary, but there’s one level of effort needed for me to write posts like this one, and another level needed for posts where I try to be comprehensible to someone who lacks all the jargon of MIRI-style decision theory. Basically, if I write with a broad audience in mind, then I’m modeling all the inferential gaps and explaining a lot more details. I would never get to points like the one I’m trying to make in this post. (I’ve tried.) Posts like this are primarily for the few people who have kept up with the CDT=EDT sequence so far, to get my updated thinking in writing in case anyone wants to go through the effort of trying to figure out what in the world I mean. To people who need a glossary, I recommend searching lesswrong and the stanford encyclopedia of philosophy.
I’ve avoided people/conversations on those grounds, but I’m not sure it is the best way to deal with it. And I really do think good intellectual progress can be made at level 2. As Ruby said in the post I’m replying to, intellectual debate is common in analytic philosophy, and it does well there.
Maybe my description of intellectual debate makes you think of all the bad arguments-are-soldiers stuff. Which it should. But, I think there’s something to be said about highly developed cultures of intellectual debate. There are a lot of conventions which make it work better, such as a strong norm of being charitable to the other side (which, in intellectual-debate culture, means an expectation that people will call you out for being uncharitable). This sort of simulates level 3 within level 2.
As for level 1, you might be able to develop some empathy for it at times when you feel particularly vulnerable and need people to do something to affirm your belongingness in a group or conversation. Keep an eye out for times when you appreciate level-one behavior from others, times when you would have appreciated some level-one comfort, or times when other people engage in level one (and decide whether it was helpful in the situation). It’s nice when we can get to a place where no one’s ego is on the line when they offer ideas, but sometimes it just is. Ignoring it doesn’t make it go away, it just makes you manage it ineptly. My guess is that you are involved with more level one situations than you think, and would endorse some of it.
(lightly edited version of my original email reply to above comment; note that Diffractor was originally replying to a version of the Dutch-book which didn’t yet call out the fact that it required an assumption of nonzero probability on actions.)
I agree that this Dutch-book argument won’t touch probability zero actions, but my thinking is that it really should apply in general to actions whose probability is bounded away from zero (in some fairly broad setting). I’m happy to require an epsilon-exploration assumption to get the conclusion.
Your thought experiment raises the issue of how to ensure in general that adding bets to a decision problem doesn’t change the decisions made. One thought I had was to make the bets always smaller than the difference in utilities. Perhaps smaller Dutch-books are in some sense less concerning, but as long as they don’t vanish to infinitesimal, seems legit. A bet that’s desirable at one scale is desirable at another. But scaling down bets may not suffice in general. Perhaps a bet-balancing scheme to ensure that nothing changes the comparative desirability of actions as the decision is made?
For your cosmic ray problem, what about:
You didn’t specify the probability of a cosmic ray. I suppose it should have probability higher than the probability of exploration. Let’s say 1/million for cosmic ray, 1/billion for exploration.
Before the agent makes the decision, it can be given the option to lose .01 util if it goes right, in exchange for +.02 utils if it goes right & cosmic ray. This will be accepted (by either a CDT agent or EDT agent), because it is worth approximately +.01 util conditioned on going right, since cosmic ray is almost certain in that case.
Then, while making the decision, cosmic ray conditioned on going right looks very unlikely in terms of CDT’s causal expectations. We give the agent the option of getting .001 util if it goes right, if it also agrees to lose .02 conditioned on going right & cosmic ray.
CDT agrees to both bets, and so loses money upon going right.
Ah, that’s not a very good money pump. I want it to lose money no matter what. Let’s try again:
Before decision: option to lose 1 millionth of a util in exchange for 2 utils if right&ray.
During decision: option to gain .1 millionth util in exchange for −2 util if right&ray.
That should do it. CDT loses .9 millionth of a util, with nothing gained. And the trick is almost the same as my dutch book for death in damascus. I think this should generalize well.
The amounts of money lost in the Dutch Book get very small, but that’s fine.
“The expectations should be equal for actions with nonzero probability”—this means a CDT agent should have equal causal expectations for any action taken with nonzero probability, and EDT agents should similarly have equal evidential expectations. Actually, I should revise my statement to be more careful: in the case of epsilon-exploring agents, the condition is >epsilon rather than >0. In any case, my statement there isn’t about evidential and causal expectations being equal to each other, but rather about one of them being conversant across (sufficiently probable) actions.
“differing counterfactual and evidential expectations are smoothly more and more tenable as actions become less and less probable”—this means that the amount we can take from a CDT agent through a Dutch Book, for an action which is given a different casual expectation than evidential expectation, smoothly reduces as the probability of an action goes to zero. In that statement, I was assuming you hold the difference between evidential and causal expectations constant add you reduce the probability of the action. Otherwise it’s not necessarily true.
I think it’s usually a good idea overall, but there is a less cooperative conversational tactic which tries to masquerade as this: listing a number of plausible straw-men in order to create the appearance that all possible interpretations of what the other person is saying are bad. (Feels like from the inside: all possible interpretations are bad; i’ll demonstrate it exhaustively...)
It’s not completely terrible, because even this combative version of the conversational move opens up the opportunity for the other person to point out the (n+1)th interpretation which hasn’t been enumerated.
You can try to differentiate yourself from this via tone (by not sounding like you’re trying to argue against the other person in asking the question), but, this will only be somewhat successful since someone trying to make the less cooperative move will also try to sound like they’re honestly trying to understand.
My gut response is that hillclimbing is itself consequentialist, so this doesn’t really help with fragility of value; if you get the hillclimbing direction slightly wrong, you’ll still end up somewhere very wrong. On the other hand, Paul’s approach rests on something which we could call a deontological approach to the hillclimbing part (IE, amplification steps do not rely on throwing more optimization power at a pre-specified function).
I wouldn’t say that preference utilitarianism “falls apart”; it just becomes much harder to implement.
And I’d like a little more definition of “autonomy” as a value—how do you operationally detect whether you’re infringing on someone’s autonomy?
My (still very informal) suggestion is that you don’t try to measure autonomy directly and optimize for it. Instead, you try to define and operate from informed consent. This (maybe) allows a system to have enough autonomy to perform complex and open-ended tasks, but not so much that you expect perverse instantiations of goals.
My proposed definition of informed consent is “the human wants X and understands the consequences of the AI doing X”, where X is something like a probability distribution on plans which the AI might enact. (… that formalization is very rough)
Is it just the right to make bad decisions (those which contradict stated goals and beliefs)?
This is certainly part of respecting an agent’s autonomy. I think more generally respecting someone’s autonomy means not taking away their freedom, not making decisions on their behalf without having prior permission to do so, and avoiding operating from assumptions about what is good or bad for a person.
Autonomy is a value and can be expressed as a part of a utility function, I think. So ambitious value learning should be able to capture it, so an aligned AI based on ambitious value learning would respect someone’s autonomy when they value it themselves. If they don’t, why impose it upon them?
One could make a similar argument for corrigibility: ambitious value learning would respect our desire for it to behave corrigibly if we actually wanted that, and if we didn’t want that, why impose it?
Corrigibility makes sense as something to ensure in its own right because it is good to have in case the value learning is not doing what it should (or something else is going wrong).
I think respect for autonomy is similarly useful. It helps avoid evil-genie (perverse instantiation) type failures by requiring that we understand what we are asking the AI to do. It helps avoid preference-manipulation problems which value learning approaches might otherwise have, because regardless of how well expected-human-value is optimized by manipulating human preferences, such manipulation usually involves fooling the human, which violates autonomy.
(In cases where humans understand the implications of value manipulation and consent to it, it’s much less concerning—though we still want to make sure the AI isn’t prone to pressure humans into that, and think carefully about whether it is really OK.)
Is the point here that you expect we can’t solve those problems and therefore need an alternative? The idea doesn’t help with “the difficulties of assuming human rationality” though so what problems does it help with?
It’s less an alternative in terms of avoiding the things which make value learning hard, and more an alternative in terms of providing a different way to apply the same underlying insights, to make something which is less of a ruthless maximizer at the end.
In other words, it doesn’t avoid the central problems of ambitious value learning (such as “what does it mean for irrational beings to have values?“), but it is a different way to try to put those insights together into a safe system. You might add other safety precautions to an ambitious value learner, such as [ambitious value learning + corrigibility + mild optimization + low impact + transparency]. Consent-based systems could be an alternative to that agglomerated approach, either replacing some of the safety measures or making them less difficult to include by providing a different foundation to build on.
Is the idea that even trying to do ambitious value learning constitutes violating someone’s autonomy (in other words someone could have a preference against having ambitious value learning done on them) and by the time we learn this it would be too late?
I think there are a couple of ways in which this is true.
I mentioned cases where a value-learner might violate privacy in ways humans wouldn’t want, because the overall result is positive in terms of the extent to which the AI can optimize human values. This is somewhat bad, but it isn’t X-risk bad. It’s not my real concern. I pointed it out because I think it is part of the bigger picture; it provides a good example of the kind of optimization a value-learner is likely to engage in, which we don’t really want.
I think the consent/autonomy idea actually gets close (though maybe not close enough) to something fundamental about safety concerns which follow an “unexpected result of optimizing something reasonable-looking” pattern. As such, it may be better to make it an explicit design feature, rather than trust the system to realize that it should be careful about maintaining human autonomy before it does anything dangerous.
It seems plausible that, interacting with humans over time, a system which respects autonomy at a basic level would converge to different overall behavior than a value-learning system which trades autonomy off with other values. If you actually get ambitious value learning really right, this is just bad. But, I don’t endorse your “why impose it on them?” argument. Humans could eventually decide to run all-out value-learning optimization (without mild optimization, without low-impact constraints, without hard-coded corrigibility). Preserving human autonomy in the meantime seems
Abstracting your idea a little: in order to go beyond first thoughts, you need some kind of strategy for developing ideas further. Without one, you will just have the same thoughts when you try to “think more” about a subject. I’ve edited my answer to elaborate on this idea.
Well, my original intention was definitely more like “why don’t more people keep developing their ideas further?” as opposed to “why don’t more people have ideas?”—but, I definitely grant that sharing ideas is what I actually am able to observe.
If someone had commented with a one-line answer like “people are intellectually active if it is rewarding”, I would have been very meh about it—it’s obvious, but trivial. All the added detail you gave makes it seem like a pretty useful observation, though.
Two possible caveats --
What determines what’s rewarding? Any set of behaviors can be explained by positing that they’re rewarding, so for this kind of model to be meaningful, there’s got to be a set of rewards involved which are relatively simple and have relatively broad explanatory power.
In order for a behavior to be rewarded in the first place, it has to be generated the first time. How does that happen? Animal trainers build up complicated tricks by rewarding steps incrementally approaching the desired behavior. Are there similar incremental steps here? What are they, and what rewards are associated with them?
(Your spelled-out details give some ideas in those directions.)
How do you manage your pipeline beyond collecting ideas?
I used to simply have an idea notebook. Writing down ideas was a monolithic activity in my head, encompassing everything from capturing the initial thought to developing it further and communicating it. I now think of those as three very different stages.
Capturing ideas: having appropriate places to write down thoughts as short memory aids, maybe a few words or a sentence or two.
Developing ideas: explaining the idea to myself in text. This allows me to take the seed of an idea and try different ways of fleshing out details, refine it, maybe turn it into something different.
Communicating: explaining it such that other people can understand it. This forces the idea to be given much more detail, often uncovering additional problems. So, I get another revision of the idea out of this (even if no one reads the writeup).
My pipeline definitely doesn’t always work, though. In particular, capturing an idea does not guarantee that I will later develop it. I find that even if I capture an idea, I’ll drop it by default if it isn’t collected with a set of related ideas which are part of an ongoing thought process. This is somewhat tricky to accomplish.
Concerning TV Tropes --
I think a primary, maybe the primary, effect that the sequences have on a reader’s thinking is through this kind of pattern-matching. It is shallow, as rationality techniques go, but it can have a large effect nonetheless. It’s like the only rationality technique you have is TAPs, and you only set up taps of the form “resemblance to rationality concept” → “think of rationality concept”. But, those taps can still be quite useful, since thinking of a relevant concept may lead to something.
Concerning the rest—not sure what to comment on, but it is a datapoint.
You seem to be claiming that it is a personality trait, something which influences how a person will interact with a broad variety of ideas and circumstances, which may or may not be true. Suggesting that it is a personality trait also comes with connotations that it would be hard to change, and may have origins in genetics or early childhood.
I’m somewhat skeptical of both claims. I suppose I think there is a broad personality factor which makes some difference, but for one person, it will tend to vary a lot from subject to subject, with potentially large (per-subject) variations throughout life (but especially around one’s teens perhaps).
What Martin is describing might somewhat resemble OCD, without actually being OCD. Let’s just say that some degree of obsession seems related to the development of ideas, at least in some cases.
I did want to focus on the descriptive question rather than the normative question. It is possible that almost all intellectual progress comes from obsessive people, while it’s also “not the happiest or most fruitful path”. Do you think that’s wrong? If so, why do you think there are other common paths? I’m actually fairly skeptical of that. It seems very plausible that obsession is causally important.
I think a big contributing factor is having some kind of intellectual community / receptive audience. Having a social context in which new ideas are expected, appreciated, and refined creates the affordance to really think about things.
The way I see it, contact with such a community only needs to happen initially. After that, many people will keep developing ideas on their own.
A school/work setting doesn’t seem to count for as much as a less formal voluntary group. It puts thinking in the context of “for work” / “for school”, which may even actively discourage developing one’s own ideas later.
Also, it seems like attempts to start intellectual groups in order to provide the social context for developing ideas will often fail. People don’t know how to start good groups by default, and there is a lot which can go wrong.
Another important bottleneck is having a mental toolkit for working on hard problems. One reason why people don’t go past the first answer which comes to mind is that they don’t have any routines to follow which get them past their first thoughts. Even if you’re asked to think more about a problem, you’ll likely rehearse the same thoughts, and reach the same conclusions, unless you have a strategy for getting new thoughts. Johnswentworth’s answer hints at this direction.
The best resource I know of for developing this kind of mental toolkit is Polya’s book How to Solve It. He provides a set of questions to ask yourself while problem-solving. At first, these questions may seem like object-level tools to help you get unstuck when you are stuck, which is true. But over time, asking the questions will help you develop a toolkit for thinking about problems.
There are a variety of axiom systems which justify mostly similar notions of rationality, and a few posts explore these axiom systems. Sniffnoy summarized Savage’s Axioms. I summarized some approaches and why I think it may be possible to do better. I wrote in detail about complete class theorems. I also took a look at consequences of the jeffrey-bolker axioms. (Jeffrey-bolker and complete class are my two favorite ways to axiomatize things, and they have some very different consequences!)
As many others are emphasizing, these axiomatic approaches don’t really summarize rationality-as-practiced, although they are highly connected. Actually, I think people are kind of downplaying the connection. Although de-biasing moves such as de-anchoring aren’t usually justified by direct appeal to rationality axioms, it is possible to flesh out that connection, and doing this with enough things will likely improve your decision-theoretic thinking.
1) The fact that there are many alternative axiom systems, and that we can judge them for various good/bad features, illustrates that one set of axioms doesn’t capture the whole of rationality (at least, not yet).
2) The fact that not even the sequences deal much with these axioms shows that they need not be central to a practice of rationality. Thoroughly understanding probability and expected utility as calculations, and understanding that there are strong arguments for these calculations in particular is more important.