I’m saying that non-uniqueness of the solution is part of the conceptual problem with Nash equilibria.
Decision theory doesn’t exactly provide a “unique solution”—it’s a theory of rational constraints on subjective belief, so, you can believe and do whatever you want within the confines of those rationality constraints. And of course classical decision theory also has problems of its own (such as logical omniscience). But there is a sense in which it is better than game theory about this, since game theory gives rationality constraints which depend on the other player in ways that are difficult to make real.
I’m not saying there’s some strategy which works regardless of the other player’s strategy. In single-player decision theory, you can still say “there’s no optimal strategy due to uncertainty about the environment”—but, you get to say “but there’s an optimal strategy given our uncertainty about the environment”, and this ends up being a fairly satisfying analysis. The nash-equilibrium picture of game theory lacks a similarly satisfying analysis. But this does not seem essential to game theory.
Something which seems missing from this discussion is the level of confidence we can have for/against CRT. It doesn’t make sense to just decide whether CRT seems more true or false and then go from there. If CRT seems at all possible (ie, outside-view probability at least 1%), doesn’t that have most of the strategic implications of CRT itself? (Like the ones you list in the relevance-to-xrisk section.) [One could definitely make the case for probabilities lower than 1%, too, but I’m not sure where the cutoff should be, so I said 1%.]
My personal position isn’t CRT (although inner-optimizer considerations have brought me closer to that position), but rather, not-obviously-not-CRT. Strategies which depend on not-CRT should go along with actually-quite-strong arguments against CRT, and/or technology for making CRT not true. It makes sense to pursue those strategies, and I sometimes think about them. But achieving confidence in not-CRT is a big obstacle.
Another obstacle to those strategies is, even if future AGI isn’t sufficiently strategic/agenty/rational to fall into the “rationality attractor”, it seems like it would be capable enough that someone could use it to create something agenty/rational enough for CRT. So even if CRT-type concerns don’t apply to super-advanced image classifiers or whatever, the overall concern might stand because at some point someone applies the same technology to RL problems, or asks a powerful GAN to imitate agentic behavior, etc.
Of course it doesn’t make sense to generically argue that we should be concerned about CRT in absence of a proof of its negation. There has to be some level of background reason for thinking CRT might be a concern. For example, although atomic weapons are concerning in many ways, it would not have made sense to raise CRT concerns about atomic weapons and ask for a proof of not-CRT before testing atomic weapons. So there has to be something about AI technology which specifically raises CRT as a concern.
One “something” is, simply, that natural instances of intelligence are associated with a relatively high degree of rationality/strategicness/agentiness (relative to non-intelligent things). But I do think there’s more reasoning to be unpacked.
I also agree with other commenters about CRT not being quite the right thing to point at, but, this issue of the degree of confidence in doubt-of-CRT was the thing that struck me as most critical. The standard of evidence for raising CRT as a legitimate concern seems like it should be much lower than the standard of evidence for setting that concern aside.
True, but, I think that’s a bad way of thinking about game theory:
The Nash equilibrium model assumes that players somehow know what equilibrium they’re in. Yet, it gives rise to an equilibrium selection problem due to the non-uniqueness of equilibria. This casts doubt on the assumption of common knowledge which underlies the definition of equilibrium.
Nash equilibria also assume a naive best-response pattern. If an agent faces a best-response agent and we assume that the Nash-equilibrium knowledge structure somehow makes sense (there is some way that agents successfully coordinate on a fixed point), then it would make more sense for an agent to select its response function (to, possibly, be something other than argmax), based on what gets the best response from the (more-naive) other player. This is similar to the UDT idea. Of course you can’t have both players do this or you’re stuck in the same situation again (ie there’s yet another meta level which a player would be better off going to).
Going to the meta-level like that seems likely to make the equilibrium selection problem worse rather than better, but, that’s not my point. My point is that Nash equilibria aren’t the end of the story; they’re a somewhat weird model. So it isn’t obvious whether a similar no-free-lunch idea applies to a better model of game theory.
Correlated equilibria are an obvious thing to mention here. They’re a more sensible model in a few ways. I think there are still some unjustified and problematic assumptions there, though.
I don’t want to claim there’s a best way, but I do think there are certain desirable properties which it makes sense to shoot for. But this still sort of points at the wrong problem.
A “naturalistic” approach to game theory is one in which game theory is an application of decision theory (not an extension) -- there should be no special reasoning which applies only to other agents. (I don’t know a better term for this, so let’s use naturalistic for now.)
Standard approaches to game theory lack this (to varying degrees). So, one frame is that we would like to come up with an approach to game theory which is naturalistic. Coming from the other side, we can attempt to apply existing decision theory to games. This ends up being more confusing and unsatisfying than one might hope. So, we can think of game theory as an especially difficult stress-test for decision theory.
So it isn’t that there should be some best strategy in multiplayer games, or even that I’m interested in a “better” player despite the lack of a notion of “best” (although I am interested in that). It’s more that UDT doesn’t give me a way to think about games. I’d like to have a way to think about games which makes sense to me, and which preserves as much as possible what seems good about UDT.
Desirable properties such as coordination are important in themselves, but are also playing an illustrative role—pointing at the problem. (It could be that coordination just shouldn’t be expected, and so, is a bad way of pointing at the problem of making game theory “make sense”—but I currently think better coordination should be possible, so, think it is a good way to point at the problem.)
FAI is a sidetrack, if we don’t have any path to FNI (friendly natural intelligence).
I don’t think I understand the reasoning behind this, though I don’t strongly disagree. Certainly it would be great to solve the “human alignment problem”. But what’s your claim?
If a bunch of fully self-interested people are about to be wiped out by an avoidable disaster (or even actively malicious people, who would like to hurt each other a little bit, but value self-preservation more), they’re still better off pooling their resources together to avert disaster.
You might have a prisoner’s dilemma / tragedy of the commons—it’s still even better if you can get everyone else to pool resources to avert disaster, while stepping aside yourself. BUT:
that’s more a coordination problem again, rather than an everyone-is-too-selfish problem
that’s not really the situation with AI, because what you have is more a situation where you can either work really hard to build AGI or work even harder to build safe AGI; it’s not a tragedy of the commons, it’s more like lemmings running off a cliff!
One point of confusion in trying to generalize bad behavior (bad equilibrium is an explanation or cause, bad behavior is the actual problem) is that incentives aren’t exogenous—they’re created and perpetuated by actors, just like the behaviors we’re trying to change. One actor’s incentives are another actor’s behaviors.
Yeah, the incentives will often be crafted perversely, which likely means that you can expect even more opposition to clear discussion, because there are powerful forces trying to coordinate on the wrong consensus about matters of fact in order to maintain plausible deniability about what they’re doing.
In the example being discussed here, it just seems like a lot of people coordinating on the easier route, partly due to momentum of older practices, partly because certain established people/institutions are somewhat threatened by the better practices.
I find it very difficult to agree to any generality without identifying some representative specifics. It feels way too much like I’m being asked to sign up for something without being told what. Relatedly, if there are zero specifics that you think fit the generalization well enough to be good examples, it seems very likely that the generalization itself is flawed.
My feeling is that small examples of the dynamic I’m pointing at come up fairly often, but things pretty reliably go poorly if I point them out, which has resulted in an aversion to pointing such things out.
The conversation has so much gravity toward blame and self-defense that it just can’t go anywhere else.
I’m not going to claim that this is a great post for communicating/educating/fixing anything. It’s a weird post.
I see what you mean, but there’s a tendency to think of ‘homo economicus’ as having perfectly selfish, non-altruistic values.
Also, quite aside from standard economics, I tend to think of economic decisions as maximizing profit. Technically, the rational agent model in economics allows arbitrary objectives. But, what kinds of market behavior should you really expect?
When analyzing celebrities, it makes sense to assume rationality with a fame-maximizing utility function, because the people who manage to become and remain celebrities will, one way or another, be acting like fame-maximizers. There’s a huge selection effect. So Homo Hollywoodicus can probably be modeled well with a fame-maximizing assumption.
This has nothing to do with the psychology of stardom. People may have all kinds of motives for what they do—whether they’re seeking stardom consciously or just happen to engage in behavior which makes them a star.
Similarly, when modeling politics, it is reasonable to make a Homo Politicus assumption that people seek to gain and maintain power. The politicians whose behavior isn’t in line with this assumption will never break into politics, or at best will be short-lived successes. This has nothing to do with the psychology of the politicians.
And again, evolutionary game theory treats reproductive success as utility, despite the many other goals which animals might have.
So, when analyzing market behavior, it makes some sense to treat money as the utility function. Those who aren’t going for money will have much less influence on the behavior of the market overall. Profit motives aren’t everything, but other motives will be less important that profit motives in market analysis.
My current understanding of quantilization is “choose randomly from the top X% of actions”. I don’t see how this helps very much with staying on-distribution… as you say, the off-distribution space is larger, so the majority of actions in the top X% of actions could still be off-distribution.
The base distribution you take the top X% of is supposed to be related to the “on-distribution” distribution, such that sampling from the base distribution is very likely to keep things on-distribution, at least if the quantilizer’s own actions are the main potential source of distributional shift. This could be the case if the quantilizer is the only powerful AGI in existence, and the actions of a powerful AGI are the only thing which would push things into sufficiently “off-distribution” possibilities for there to be a concern. (I’m not saying these are entirely reasonable assumptions; I’m just saying that this is one way of thinking about quantilization.)
In any case, quantilization seems like it shouldn’t work due to the fragility of value thesis. If we were to order all of the possible configurations of Earth’s atoms from best to worst according to our values, the top 1% of those configurations is still mostly configurations which aren’t very valuable.
The base distribution quantilization samples from is about actions, or plans, or policies, or things like that—not about configurations of atoms.
So, you should imagine a robot sending random motor commands to its actuators, not highly intelligently steering the planet into a random configuration.
Thinking up actual historical examples is hard for me. The following is mostly true, partly made up.
(#4) I don’t necessarily have trouble talking about my emotions, but when there are any clear incentives for me to make particular claims, I tend to shut down. It feels viscerally dishonest (at least sometimes) to say things, particularly positive things, which I have an incentive to say. For example, responding “it’s good to see you too” in response to “it’s good to see you” sometimes (not always) feels dishonest even when true.
(#4) Talking about money with an employer feels very difficult, in a way that’s related to intuitively discarding any motivated arguments and expecting others to do the same.
(#6) I’m not sure if I was at the party, but I am generally in the crowd Grognor was talking about, and very likely engaged in similar behavior to what he describes.
(#5) I have tripped up when trying to explain something because I noticed myself reaching for examples to prove my point, and the “cherry-picking” alarm went off.
(#5, #4) I have noticed that a friend was selecting arguments that I should go to the movies with him in a biased way which ignored arguments to the contrary, and ‘shut down’ in the conversation (become noncommittal / slightly unresponsive).
(#3) I have thought in mistaken ways which would have accepted modest-epistemology arguments, when thinking about decision theory.
By “is a PD”, I mean, there is a cooperative solution which is better than any Nash equilibrium. In some sense, the self-interest of the players is what prevents them from getting to the better solution.
By “is a SH”, I mean, there is at least one good cooperative solution which is an equilibrium, but there are also other equilibria which are significantly worse. Some of the worse outcomes can be forced by unilateral action, but the better outcomes require coordinated action (and attempted-but-failed coordination is even worse than the bad solutions).
In iterated PD (with the right assumptions, eg appropriately high probabilities of the game continuing after each round), tit-for-tat is an equilibrium strategy which results in a pure-cooperation outcome. The remaining difficulty of the game is the difficulty of ending up in that equilibrium. There are many other equilibria which one could equally well end up in, including total mutual defection. In that sense, iteration can turn a PD into a SH.
Other modifications, such as commitment mechanisms or access to the other player’s source code, can have similar effects.
I view the issue of intellectual modesty much like the issue of anthropics. The only people who matter are those whose decisions are subjunctively linked to yours (it only starts getting complicated when you start asking whether you should be intellectually modest about your reasoning about intellectual modesty)
I agree fairly strongly, but this seems far from the final word on the subject, to me.
One issue with the clever arguer is that the persuasiveness of their arguments might have very little to do with how persuasive they should be, so attempting to work off expectations might fail.
Ah. I take you to be saying that the quality of the clever arguer’s argument can be high variance, since there is a good deal of chance in the quality of evidence cherry-picking is able to find. A good point. But, is it ‘too high’? Do we want to do something (beyond the strategy I sketched in the post) to reduce variance?
That seems about right.
A concern I didn’t mention in the post—it isn’t obvious how to respond to game-theoretic concerns. Carefully estimating the size of the update you should make when someone fails to provide good reason can be difficult, since you have to model other agents, and you might make exploitable errors.
An extreme way of addressing this is to ignore all evidence short of mathematical proof if you have any non-negligible suspicion about manipulation, similar to the mistake I describe myself making in the post. This seems too extreme, but it isn’t clear what the right thing to do overall is. The fully-Bayesian approach to estimating the amount of evidence should act similarly to a good game-theoretic solution, I think, but there might be reason to use a simpler strategy with less chance of exploitable patterns.
Thank you! Appreciative comments really help me to be less risk-averse about posting.
I like your suggestion well enough that I might edit the post. (I’ll let it sit a bit to see whether I change my mind.)
Maybe serial-access vs random-access, as in computer memory.
Yeah.… I thought about this problem while writing, but didn’t think of an alternative I liked.
I’m curious what your guess was.