The LessWrongy framework I’m familiar with would say that value = expected utility, so it takes potential downsides into account. You’re not risk-averse wrt your VNM utility function, but computing that utility function is hard in practice, and EV calculations can benefit from some consideration of the tail-risks.
Schelling’s The Strategy of Conflict seems very relevant here; a major focus is precommitment as a bargaining tool. See here for an old review by cousin_it.
Iterated chicken seems fine to test, just as a spinoff of the IPD that maps to slightly different situations. (I believe that the iterated game of mutually modeling each other’s single-shot strategy is different from iterating the game itself, so I don’t think Abram’s post necessarily implies that iterated chicken is relevant to ASI blackmail solutions.)
Speaking of iterated games, one natural form of blackmail is for the blackmailee to pay an income stream to the blackmailer; that way, at each time-step they’re paying their fair price for the good of [not having their secret revealed between time t and time t+1]. Here’s a well-cited paper that discusses this idea in the context of nuclear brinksmanship: Schwarz & Sonin 2007.
It’s true the net effect is low to first order, but you’re neglecting second-order effects. If premia are important enough, people will feel compelled to Goodhart proxies used for them until those proxies have less meaning.
Given the linked siderea post, maybe this is not very true for insurance in particular. I agree that wasn’t a great example.
Slack-wise, uh, choices are bad. really bad. Keep the sabbath. These are some intuitions I suspect are at play here. I’m not interested in a detailed argument hashing out whether we should believe that these outweigh other factors in practice across whatever range of scenarios, because it seems like it would take a lot of time/effort for me to actually build good models here, and opportunity costs are a thing. I just want to point out that these ideas seem relevant for correctly interpreting Zvi’s position.
The post implies it is bad to be judged. I could have misinterpreted why, but that implication is there. If judge just meant “make inferences about” why would it be bad?
As Raemon says, knowing that others are making correct inferences about your behavior means you can’t relax. No, idk, watching soap operas, because that’s an indicator of being less likely to repay your loans, and your premia go up. There’s an ethos of slack, decisionmaking-has-costs, strategizing-has-costs that Zvi’s explored in his previous posts, and that’s part of how I’m interpreting what he’s saying here.
But it also helps in knowing who’s exploiting them! Why does it give more advantages to the “bad” side?
Sure, but doesn’t it help me against them too?
You don’t want to spend your precious time on blackmailing random jerks, probably. So at best, now some of your income goes toward paying a white-hat blackmailer to fend off the black-hats. (Unclear what the market for that looks like. Also, black-hatters can afford to specialize in unblackmailability; it comes up much more often for them than the average person.) You’re right, though, that it’s possible to have an equilibrium where deterrence dominates and the black-hatting incentives are low, in which case maybe the white-hat fees are low and now you have a white-hat deterrent. So this isn’t strictly bad, though my instinct is that it’s bad in most plausible cases.
Why would you expect the terrorists to be miscalibrated about this before the reduction in privacy, to the point where they think people won’t negotiate with them when they actually will, and less privacy predictably changes this opinion?
That’s a fair point! A couple of counterpoints: I think risk-aversion of ‘terrorists’ helps. There’s also a point about second-order effects again; the easier it is to blackmail/extort/etc., the more people can afford to specialize in it and reap economies of scale.
Perhaps the optimal set of norms for these people is “there are no rules, do what you want”. If you can improve on that, than that would constitute a norm-set that is more just than normlessness. Capturing true ethical law in the norms most people follow isn’t necessary.
Eh, sure. My guess is that Zvi is making a statement about norms as they are likely to exist in human societies with some level of intuitive-similarity to our own. I think the useful question here is like “is it possible to instantiate norms s.t. norm-violations are ~all ethical-violations”. (we’re still discussing the value of less privacy/more blackmail, right?) No-rule or few-rule communities could work for this, but I expect it to be pretty hard to instantiate them at large scale. So sure, this does mean you could maybe build a small local community where blackmail is easy. That’s even kind of just what social groups are, as Zvi notes; places where you can share sensitive info because you won’t be judged much, nor attacked as a norm-violator. Having that work at super-Dunbar level seems tough.
I found this pretty useful—Zvi’s definitely reflecting a particular, pretty negative view of society and strategy here. But I disagree with some of your inferences, and I think you’re somewhat exaggerating the level of gloom-and-doom implicit in the post.>Implication: “judge” means to use information against someone. Linguistic norms related to the word “judgment” are thoroughly corrupt enough that it’s worth ceding to these, linguistically, and using “judge” to mean (usually unjustly!) using information against people.No, this isn’t bare repetition. I agree with Raemon that “judge” here means something closer to one of its standard usages, “to make inferences about”. Though it also fits with the colloquial “deem unworthy for baring [understandable] flaws”, which is also a thing that would happen with blackmail and could be bad.>Implication: more generally available information about what strategies people are using helps “our” enemies more than it helps “us”. (This seems false to me, for notions of “us” that I usually use in strategy)I can imagine a couple things going on here? One, if the world is a place where may more vulnerabilities are more known, this incentivizes more people to specialize in exploiting those vulnerabilities. Two, as a flawed human there are probably some stressors against which you can’t credibly play the “won’t negotiate with terrorists” card. >Implication: even in the most just possible system of norms, it would be good to sometimes violate those norms and hide the fact that you violated them. (This seems incorrect to me!)I think the assumption is these are ~baseline humans we’re talking about, and most human brains can’t hold norms of sufficient sophistication to capture true ethical law, and are also biased in ways that will sometimes strain against reflectively-endorsed ethics (e.g. they’re prone to using constrained circles of moral concern rather than universality). >Implication: the bad guys won; we have rule by gangsters, who aren’t concerned with sustainable production, and just take as much stuff as possible in the short term. (This seems on the right track but partially false; the top marginal tax rate isn’t 100%)This part of the post reminded me of (the SSC review of) Seeing Like a State, which makes a similar point; surveying and ‘rationalizing’ farmland, taking a census, etc. = legibility = taxability. “all of them” does seem like hyperbole here. I guess you can imagine the maximally inconvenient case where motivated people with low cost of time and few compunctions know your resources and full utility function, and can proceed to extract ~all liquid value from you.
The CHAI reading list is also fairly out of date (last updated april 2017) but has a few more papers, especially if you go to the top and select  or  so it shows lower-priority ones.
(And in case others haven’t seen it, here’s the MIRI reading guide for learning agent foundations.)
Oh wait, yeah, this is just an example of the general principle “when you’re optimizing for xy, and you have a limited budget with linear costs on x and y, the optimal allocation is to spend equal amounts on both.”
Formally, you can show this via Lagrange-multiplier optimization, using the Lagrangian L(x,y)=xy−λ(ax+by−M). Setting the partials equal to zero gets you λ=y/a=x/b, and you recover the linear constraint function ax+by=M. So ax=by=M/2. (Alternatively, just optimizing xM−axb works, but I like Lagrange multipliers.)
In this case, we want to maximize pq+(1−p)rq0=p(q−rq0)−rq0, which is equivalent to optimizing p∗(q−rq0). Let’s define w = q−rq0, so we’re optimizing p∗w.
Our constraint function is defined by the tradeoff between p and w. p(k)=(.5−p0)k+p0, so k=p−p0.5−p0. w(k)=(r−1)q0k+q0−rq0=(r−1)q0(k−1), so k=−w(1−r)q0+1=p−p0.5−p0 .
Rearranging gives the constraint function .5−p0(1−r)q0w+p=.5. This is indeed linear, with a total ‘budget’ M of .5 and a p-coefficient b of 1. So by the above theorem we should have 1∗p=.5/2=.25.
I think your solution to “reckless rivals” might be wrong? I think you mistakenly put a multiplier of q instead of a p on the left-hand side of the inequality. (The derivation of the general inequality checks out, though, and I like your point about discontinuous effects of capacity investment when you assume that the opponent plays a known pure strategy.)
I’ll use slightly different notation from yours, to avoid overloading p and q. (This ends up not mattering because of linearity, but eh.) Let p0,q0 be the initial probabilities for winning and safety|winning. Let k be the capacity variable, and without loss of generality let k start at 0 and end at km. Then p(k)=.5−p0kmk+p0, and q(k)=rq0−q0kmk+q0 . So p′=.5−p0km, so pp′=p∗km.5−p0. And q′=rq0−q0km, so −q′q=q0(1−r)q∗km.
Therefore, the left-hand side of the inequality, −pq′p′q, equals p.5−p0∗q0(1−r)q. At the initial point k=0, this simplifies to p0.5−p0(1−r).
Let’s assume α=1. The relative safety of the other project is β=rq0q, which at k=0 simplifies to r.
Thus we should commit more to capacity when 1−r>p0.5−p0(1−r), or 1>p0.5−p0, or .25>p0. This is a little weird, but makes a bit more intuitive sense to me than q0+p0 or q0−p0 mattering.
Yeah, I worry that competitive pressure could convince people to push for unsafe systems. Military AI seems like an especially risky case. Military goals are harder to specify than “maximize portfolio value”, but there are probably reasonable proxies, and as AI gets more capable and more widely used there’s a strong incentive to get ahead of the competition.
Yeah, I think you’re right.* So it actually looks the same as the “TFTWF accidentally defects” case.
*assuming we specify TFTWF as “defect against DD, cooperate otherwise”. I don’t see a reasonable alternate definition. I think you’re right that defecting against DC is bad, and if we go to 3-memory, defecting against DDC while cooperating with DCD seems bad too.** Sarah can’t be assuming the latter, anyway, because the “TFTWF accidentally defects” case would look different.
**there might be some fairly reasonably-behaved variant that’s like “defect if >=2 of 3 past moves were D”, but that seems like a) probably bad since I just made it up and b) not what’s being discussed here.
I liked the playful writing here.
Maybe I’m being dumb, but I feel like spelling out some of your ideas would have been useful. (Or maybe you’re just playing with ~pre-rigor intuitions, and I’m overthinking this.)
I think “float to the top” could plausibly mean:
A. In practice, human nature biases us towards treating these ideas as if they were true.
B. Ideal reasoning implies that these ideas should be treated as if they were true.
C. By postulate, these ideas end up reaching fixation in society. [Which then implies things about what members of society can and can’t recognize, e.g. the existence of AIXI-like actors.]
Likewise, what level do you want a NAT to be implemented at? Personal behavior? Structure of group blog sites? Social norms?
I’ll echo the other commenters in saying this was interesting and valuable, but also (perhaps necessarily) left me to cross some significant inferential gaps. The biggest for me were in going from game-descriptions to equilibria. Maybe this is just a thing that can’t be made intuitive to people who haven’t solved it out? But I think that, e.g., graphs of the kinds of distributions you get in different cases would have helped me, at least.
I also had to think for a bit about what assumptions you were making here:
A more rigorous or multi-step process could have only done so much. To get better information, they would have had to add a different kind of test. That would risk introducing bad noise.
A very naive model says additional tests → uncorrelated noise → less noise in the average.
More realistically, we can assume that some dimensions of quality are easier to Goodhart than others, and you don’t know which are which beforehand. But then, how do you know your initial choice of test isn’t Goodhart-y? And even if the Goodhart noise is much larger than the true variation in skill, it seems like you can aggregate scores in a way that would allow you to make use of the information from the different tests without being bamboozled. (Depending on your use-case, you could take the average of a concave function of the scores, or use quantiles, or take the min score, etc.)
In reality, though, you usually have some idea what dimensions are important for the job. Maybe it’s something like PCA, with the noise/signal ratio of dimensions decreasing as you go down the list of components. Then that decrease, plus marginal costs of more tests, means that there is some natural stopping point. I guess that makes sense, but it took a bit for me to get there. Is that what you were thinking?
A similar concept is the idea of offense-defense balance in international relations. eg, large stockpiles of nuclear weapons strongly favor “defense” (well, deterrence) because it’s prohibitively costly to develop the capacity to reliably destroy the enemy’s second-strike forces. Note the caveats there: at sufficient resource levels, and given constraints imposed by other technologies (eg inability to detect nuclear subs).
Allan Dafoe and Ben Garfinkel have a paper out on how techs tend to favor offense at low investment and defense at high investment. (That is, the resource ratio R at which an attacker with resources RD has an X% chance of defeating a defender with resources D tends to decrease with D up to a local maximum, then increase.)
(On mobile, will link later.)
Well, it’s nonequilibrium, so pressure isn’t even at each layer of water any more...
When I picture this happening, there’s a pulse of high-pressure water below the rock. If you froze the rock’s motion while keeping its force on the water below it, I think the pulse would eventually equilibrate out of existence as water flowed to the side? Or if I imagine a fluid with strong drag forces on the rock, but which flows smoothly itself, it again seems plausible that the pressure equilibrates at the bottom.
(More confident in the first para than the second one.)
Hey, noticed what might be errors in your lesion chart: No lesion, no cancer should give +1m utils in both cases. And your probabilities don’t add to 1. Including p(lesion) explicitly doesn’t meaningfully change the EV difference, so eh. However, my understanding is that the core of the lesion problem is recognizing that p(lesion) is independent of smoking; EYNS seems to say the same. Might be worth including it to make that clearer?
(I don’t know much about decision theory, so maybe I’m just confused.)
I think what avturchin is getting at is that when you say “there is a 1⁄3 chance your memory is false and a 1⁄3 chance you are the original”, you’re implicitly conditioning only on “being one of the N total clones”, ignoring the extra information “do you remember the last split” which provides a lot of useful information. That is, if each clone fully conditioned on the information available to them, you’d get 0-.5-.5 as subjective probabilities due to your step 2.
If that’s not what you’re going for, it seems like maybe the probability you’re calculating is “probability that, given you’re randomly (uniformly) assigned to be one of the N people, you’re the original”. But then that’s obviously 1/N regardless of memory shenanigans.
If you think this is not what you’re saying, then I’m confused.
The idea of reducing hypotheses to bitstrings (ie, programs to be run on a universal Turing machine) actually helped me a lot in understanding something about science that hindisght had previously cheapened for me. Looking back on the founding of quantum mechanics, it’s easy to say “right, they should have abandoned their idea of particles existing as point objects with definite position and adopted the concept and language of probability distributions, rather than assuming a particle really exists and is just ‘hidden’ by the wavefunction.” But the scientists of the day had a programming language in their heads where “particle” was a basic object and probability was something complicated that you had to build up—the optimization process of science had arrived at a local maximum in the landscape of possible languages to describe the world.
I realize this is a pretty simple insight, but I’m glad the article gave me a way to better understand this.