AI strategy & governance. Blog: Not Optional.
Zach Stein-Perlman(Zachary Stein-Perlman)
The Governance Problem and the “Pretty Good” X-Risk
Yes, I definitely consider (successful, philosophically sound) CEV to be a great use of superintelligence. An earlier draft mentioned CEV explicitly, but I decided to just mention the broader category “indirect normativity,” which should include any sound method for specifying values indirectly.
Thanks for writing this. Stories like this help me understand possibilities for the future (and understand how others think).
The US and many other Western governments are gears-locked, because the politicians are products of this memetic environment. People say it’s a miracle that the US isn’t in a civil war already.
So far in your vignette, AI is sufficiently important and has sufficient public attention that any functional government would be (1) regulating it, or at least exerting pressure on the shape of AI through the possibility of regulation, and especially (2) appreciating the national security implications of near-future AI. But in your vignette, governments fail to respond meaningfully to AI; they aren’t part of the picture (so far). This would surprise me. I don’t understand how epistemic decline on the internet translates into governments’ failure to respond to AI. How do you imagine this happening? I expect that the US federal government will be very important in the next decade, so I’m very interested in better understanding possibilities.
Also: does epistemic decline and social dysfunction affect AI companies?
I tentatively agree but “people realizing it’s more dangerous than nukes” has potential negative consequences too — an arms race is the default outcome of such national security threats/opportunities. I’ve recently been trying to think about different memes about AI and their possible effects… it’s possible that memes like “powerful AI is fragile” could get the same regulation and safety work with less arms racing.
Consequences (in expectation) if widely accepted: very good.
Compressibility: poor (at least, good compressions are not obvious).
Probability of (a compressed version) becoming widely accepted or Respectable Opinion: moderately low due to weirdness. Less weird explanations of why AI might not do what we want would be more Respectable and acceptable.
Leverage (i.e., increase in that probability from increased marginal effort to promote that meme): uncertain.
Related idea, off the cuff, rough. Not really important or interesting, but might lead to interesting insights. Mostly intended for my future selves, but comments are welcome.
Binaries Are Analytically Valuable
Suppose our probability distribution for alignment success is nearly binary. In particular, suppose that we have high credence that, by the time we can create an AI capable of triggering an intelligence explosion, we will have
really solved alignment (i.e., we can create an aligned AI capable of triggering an intelligence explosion at reasonable extra cost and delay) or
really not solved alignment (i.e., we cannot create a similarly powerful aligned AI, or doing so would require very unreasonable extra cost and delay)
(Whether this is actually true is irrelevant to my point.)
Why would this matter?
Stating the risk from an unaligned intelligence explosion is kind of awkward: it’s that the alignment tax is greater than what the leading AI project is able/willing to pay. Equivalently, our goal is for the alignment tax to be less than what the leading AI project is able/willing to pay. This gives rise to two nice, clean desiderata:
Decrease the alignment tax
Increase what the leading AI project is able/willing to pay for alignment
But unfortunately, we can’t similarly split the goal (or risk) into two goals (or risks). For example, a breakdown into the following two goals does not capture the risk from an unaligned intelligence explosion:
Make the alignment tax less than 6 months and a trillion dollars
Make the leading AI project able/willing to spend 6 months and a trillion dollars on aligning an AI
It would suffice to achieve both of these goals, but doing so is not necessary. If we fail to reduce the alignment tax this far, we can compensate by doing better on the willingness-to-pay front, and vice versa.
But if alignment success is binary, then we actually can decompose the goal (bolded above) into two necessary (and jointly sufficient) conditions:
Really solve alignment; i.e., reduce the alignment tax to [reasonable value]
Make the leading AI project able/willing to spend [reasonable value] on alignment
(Where [reasonable value] depends on what exactly our binary-ish probability distribution for alignment success looks like.)
Breaking big goals down into smaller goals—in particular, into smaller necessary conditions—is valuable, analytically and pragmatically. Binaries help, when they exist. Sometimes weaker conditions on the probability distribution, those of the form a certain important subset of possibilities has very low probability, can be useful in the same way.
Great Power Conflict
Maybe AI Will Happen Outside US/China
I’m interested in the claim important AI development (in the next few decades) will largely occur outside any of the states that currently look likely to lead AI development. I don’t think this is likely, but I haven’t seen discussion of this claim.[1] This would matter because it would greatly affect the environment in which AI is developed and affect which agents are empowered by powerful AI.
Epistemic status: brainstorm. May be developed into a full post if I learn or think more.
I. Causes
The big tech companies are in the US and China, and discussion often assumes that these two states have a large lead on AI development. So how could important development occur in another state? Perhaps other states’ tech programs (private or governmental) will grow. But more likely, I think, an already-strong company leaves the US for a new location.
My legal knowledge is insufficient to say how well companies can leave their states with any confidence. My impression is that large American companies largely can leave while large Chinese companies cannot.
Why might a big tech company or AI lab want to leave a state?[2]
Fleeing expropriation/nationalization. States can largely expropriate companies’ property within their territory unless they have contracted otherwise. A company may be able to protect its independence by securing legal protection from expropriation from another state, then moving its hardware to that state. It may move its headquarters or workers as well.
Fleeing domestic regulation on development and/or deployment of AI.
II. Effects
The state in which powerful AI is developed has two important effects.
States set regulations. The regulatory environment around an AI lab may affect the narrow AI systems it builds and/or how it pursues AGI.
State influence & power. The state in which AGI is achieved can probably nationalize that project (perhaps well before AGI). State control of powerful AI affects how it will be used.
III. AI deployment before superintelligence
Eliezer recently tweeted that AI might be low-impact until superintelligence because of constraints on deployment. This seems partially right — for example, medicine and education seem like areas in which marginal improvements in our capabilities have only small effects due to civilizational inadequacy. Certainly some AI systems would require local regulatory approval to be useful; those might well be limited in the US. But a large fraction of AI systems won’t be prohibited by plausible American regulation. For example, I would be quite surprised if the following kinds of systems were prohibited by regulation (disclaimer: I’m very non-expert on near-future AI):
Business services
Operations/logistics
Analysis
Productivity tools (e.g., Codex, search tools)
Online consumer services — financial, writing assistants (Codex)
Production of goods that can be shipped cheaply (like computers but not houses)
Trading
Maybe media stuff (chatbots, persuasion systems). It’s really hard to imagine the US banning chatbots. I’m not sure how persuasion-AI is implemented; custom ads could conceivably be banned, but eliminating AI-written media is implausible.
This matters because these AI applications directly affect some places even if they couldn’t be developed in those places.
In the unlikely event that the US moves against not only the deployment but also the development of such systems, AI companies would be more likely to seek a way around regulation — such as relocating.
Ha, I took intro IR last semester so I should have caught this. Fixed, thanks.
Thanks for your comment.
It’s unclear how strongly the control about the individual actors are controlled by their respective governments.
Good point. If I understand right, this is an additional risk factor: there’s a risk of violence that neither state wants due to imperfect internal coordination, and this risk generally increases with international tension, number of humans in a position to choose to act hostile or attack, general confusion, and perhaps the speed at which conflict occurs. Please let me know if you were thinking something else.
The countries that are players are all different, so you lose insight when you talk about Albania and Botswana instead of the real players.
Of course. I did acknowledge this: “Consideration of more specific factors, such as what conflict might look like between specific states or involving specific technologies, is also valuable but is not my goal here.” I think we can usefully think about conflict without considering specific states. Focusing on, say, US-China conflict might obscure more general conclusions.
Given Russia tolerating all the ransomware attacks being launched from their soil, it could be that one US president says “Enough, if Russia doesn’t do anything against attacks from their soil on the West, let’s decrimilize hacking Russian targets”.
Hmm, I haven’t heard this suggested before. This would greatly surprise me (indeed, I’m not familiar with domestic or international law for cyber stuff, but I would be surprised to learn that US criminal law was the thing stopping cyberattacks on Russian organizations from US hackers or organizations). And I’m not sure how this would change the conflict landscape.
Oh, interesting.
Speaking about states wanting things obscures a lot.
So I assume you would frame states as less agenty and frame the source of conflict as decentralized — arising from the complex interactions of many humans, which are less predictable than “what states want” but still predictably affected by factors like bilateral tension/hostility, general chaos, and various technologies in various ways?
I strongly support SIA over SSA. I haven’t read this sequence yet. But it looks like the sequence is about why the consequences of SIA are superior to those of SSA. This is a fine project. But a set of reasons for SIA over SSA just as strong as its more acceptable consequences, I think, is its great theoretical coherence.
SIA says: given your prior, multiply every possible universe by the number/volume of observers indistinguishable from you in that universe, then normalize. This is intuitive, it has a nice meaning,* and it doesn’t have a discontinuity at zero observers.
*Namely: I’m a random member of the prior-probability-weighted set of possible observers indistinguishable from me.
For SSA, on the other hand, it’s hard to even explicate the anthropic update. But I think any formalization will require treating the update to zero probability for zero-indistinguishable-observers as a special case.
You want “to build a thriving in-person rationality and longtermism community in the Bay Area.” That sounds great. How do you plan to do it, at any level of generality? ‘Thriving community’ can mean a lot of different things.
Ha, I wrote a comment like yours but slightly worse, then refreshed and your comment appeared. So now I’ll just add one small note:
To the extent that (1) normatively, we care much more about the rest of the universe than our personal lives/futures, and (2) empirically, we believe that our choices are much more consequential if we are non-simulated than if we are simulated, we should in practice act as if there are greater odds that we are non-simulated than we have reason to believe for purely epistemic purposes. So in practice, I’m particularly interested in (C) (and I tentatively buy SIA doomsday as explained by Katja Grace).
Edit: also, isn’t the last part of this sentence from the post wrong:
SIA therefore advises not that the Great Filter is ahead, but rather that we are in a simulation run by an intergalactic human civilization, without strong views on late filters for unsimulated reality.
Ah, I agree. I misread that bit as about filters for us given that we are non-simulated, but really it’s about filters for non-simulated civilizations, which under the simulation argument our existence doesn’t tell us much about. Thanks.
I agree with all of this (and I admire its clarity). In addition, I believe that the SIA-formulated questions are generally the important ones, for roughly the reason that the consequences of our choices are generally more like value is proportional to correct actions than value is proportional to fraction of actions correct (across all observers subjectively indistinguishable from me). (Our choices seem to be local in some sense; their effects are independent of the choices of our faraway subjectively-indistinguishable counterparts, and their effects seem to scale with our numbers. Perhaps some formalization of bigger universes matter more is equivalent.)
I’m not sure about this, but perhaps with some kind of locality assumption, the intuitive sense of probability as something like odds at which I’m indifferent to bet (under certain idealizations) reduces to SIA probability, whereas SSA probability would correspond to something like odds at which I’m indifferent to bet if the value from winning is proportional to the fraction rather than the number of correct bets. Again, SSA is in conflict with bigger universes matter more; assuming locality, this is particularly disturbing since it roughly means that the value of a choice is inversely proportional to the number of similarly-situated choosers.
I mostly agree. Two thoughts:
Rather than thinking in terms of wages, w(t), I think we should just think in terms of time-value or marginal utility, u(t). Clearly everything you say applies to all value-we-can-get-from-time, not just wages.
Some of your conclusions (e.g., “if you would be willing to trade an hour for $1000 in the future [i.e., and gain the hour], you should also be willing to do so now”) only apply when the following is true: for the rest of the person’s life, their value-from-time is a (nondecreasing) function of their time spent working in the past. This is a plausible approximation in many cases. But both of the following are plausible:
working/experience will have a greater effect on u in the future, after I’m better-credentialed (I’m currently an undergraduate; the same number of hours experience counts for more if it’s done in a higher-status position at a higher-status organization; first working in a lower-status way has some career benefits but its not like the next t hours of my work experience will have the same effect on my long-term prospects regardless of whether I get a low-status job now or a high-status job after college)
I will have higher direct time-value in the future due to reasons that are nearly independent of how I spend my marginal time now.
More generally, it seems sometimes true that e.g. doing 60 hrs/wk for 1 year is less valuable for u than 30 hrs/wk for 2 years (because value from work is generally a function not just of experience/productivity but also legibility-of-experience). If I could save up marginal time now to spend later, I would (not just because I expect some future time to be higher-leverage because of TAI, but also because I expect to have higher direct time-value in the future in a way that I largely can’t affect by spending more time productively now); I’m not sure where I’d draw the line, but I’m pretty sure I’d save up time even if it cost 2 hours to give a single hour to future me.
(I’m not great at Markdown; please let me know if there’s a way to make the above paragraph part of the second bullet point)
[Question] Rationalism for New EAs
Consequentialism might harm survival
In general, the correctness of [a principle] is one matter; the correctness of accepting it, quite another. I think you conflate the claims consequentialism is true and naive consequentialist decision procedures are optimal. Even if we have decisive epistemic reason to accept consequentialism (of some sort), we may have decisive moral or prudential reason to use non-consequentialist decision procedures. So I would at least narrow your claims to consequentialist decision procedures.
evolution as a force typically acts on collectives, not individuals.
I’m not sure what you’re asserting here or how it’s relevant. Can you be more specific?
Value Is Binary
Epistemic status: rough ethical and empirical heuristic.
Assuming that value is roughly linear in resources available after we reach technological maturity,[1] my probability distribution of value is so bimodal that it is nearly binary. In particular, I assign substantial probability to near-optimal futures (at least 99% of the value of the optimal future), substantial probability to near-zero-value futures (between −1% and 1% of the value of the optimal future), and little probability to anything else.[2] To the extent that almost all of the probability mass fits into two buckets, and everything within a bucket is almost equally valuable as everything else in that bucket, the goal maximize expected value reduces to the goal maximize probability of the better bucket.
So rather than thinking about how to maximize expected value, I generally think about maximizing the probability of a great (i.e., near-optimal) future. This goal is easier for me to think about, particularly since I believe that the paths to a great future are rather homogeneous — alike not just in value but in high-level structure. In the rest of this shortform, I explain my belief that the future is likely to be near-optimal or near-zero.
Substantial probability to near-optimal futures.
I have substantial credence that the future is at least 99% as good as the optimal future.[3] I do not claim much certainty about what the optimal future looks like — my baseline assumption is that it involves increasing and improving consciousness in the universe, but I have little idea whether that would look like many very small minds or a few very big minds. Or perhaps the optimal future involves astronomical-scale acausal trade. Or perhaps future advances in ethics, decision theory, or physics will have unforeseeable implications for how a technologically mature civilization can do good.
But uniting almost all of my probability mass for near-optimal futures is how we get there, at a high level: we create superintelligence, achieve technological maturity, solve ethics, and then optimize. Without knowing what this looks like in detail, I assign substantial probability to the proposition that humanity successfully completes this process. And I think almost all futures in which we do complete this process look very similar: they have nearly identical technology, reach the same conclusions on ethics, have nearly identical resources available to them (mostly depending on how long it took them to reach maturity), and so produce nearly identical value.
Almost all of the remaining probability to near-zero futures.
This claim is bolder, I think. Even if it seems reasonable to expect a substantial fraction of possible futures to converge to near-optimal, it may seem odd to expect almost all of the rest to be near-zero. But I find it difficult to imagine any other futures.
For a future to not be near-zero, it must involve using a nontrivial fraction of the resources available in the optimal future (by my assumption that value is roughly linear in resources). More significantly, the future must involve using resources at a nontrivial fraction of the efficiency of their use in the optimal future. This seems unlikely to happen by accident. In particular, I claim:
If a future does not involve optimizing for the good, value is almost certainly near-zero.
Roughly, this holds if all (nontrivially efficient) ways of promoting the good are not efficient ways of optimizing for anything else that we might optimize for. I strongly intuit that this is true; I expect that as technology improves, efficiently producing a unit of something will produce very little of almost all other things (where “thing” includes not just stuff but also minds, qualia, etc.).[4] If so, then value (or disvalue) is (in expectation) a negligible side effect of optimization for other things. And I cannot reasonably imagine a future optimized for disvalue, so I think almost all non-near-optimal futures are near-zero.
So I believe that either we optimize for value and get a near-optimal future, or we do anything else and get a near-zero future.
Intuitively, it seems possible to optimize for more than one value. I think such scenarios are unlikely. Even if our utility function has multiple linear terms, unless there is some surprisingly good way to achieve them simultaneously, we optimize by pursuing one of them near-exclusively.[5] Optimizing a utility function that looks more like min(x,y) may be a plausible result of a grand bargain, but such a scenario requires that, after we have mature technology, multiple agents have nontrivial bargaining power and different values. I find this unlikely; I expect singleton-like scenarios and that powerful agents will either all converge to the same preferences or all have near-zero-value preferences.
I mostly see “value is binary” as a heuristic for reframing problems. It also has implications for what we should do: to the extent that value is binary (and to the extent that doing so is feasible), we should focus on increasing the probability of great futures. If a “catastrophic” future is one in which we realize no more than a small fraction of our value, then a great future is simply one which is not catastrophic and we should focus on avoiding catastrophes. But of course, “value is binary” is an empirical approximation rather than an a priori truth. Even if value seems very nearly binary, we should not reject contrary proposed interventions[6] or possible futures out of hand.
I would appreciate suggestions on how to make these ideas more formal or precise (in addition to comments on what I got wrong or left out, of course). Also, this shortform relies on argument by “I struggle to imagine”; if you can imagine something I cannot, please explain your scenario and I will justify my skepticism or update.
You would reject this if you believed that astronomical-scale goods are not astronomically better than Earth-scale goods or if you believed that some plausible Earth-scale bad would be worse than astronomical-scale goods are good.
“Optimal” value is roughly defined as the expected value of the future in which we act as well as possible, from our current limited knowledge about what “acting well” looks like. “Zero” is roughly defined as any future in which we fail to do anything astronomically significant. I consider value relative to the optimal future, ignoring uncertainty about how good the optimal future is — we should theoretically act as if we’re in a universe with high variance in value between different possibilities, but I don’t see how this affects what we should choose before reaching technological maturity.*
*Except roughly that we should act with unrealistically low probability that we are in a kind of simulation in which our choices matter very little or have very differently-valued consequences than otherwise. The prospect of such simulations might undermine my conclusions—value might still be binary, but for the wrong reason—so it is useful to be able to almost-ignore such possibilities.
That is, at least 99% of the way from the zero-value future to the optimal future.
If we particularly believe that value is fragile, we have an additional reason to expect this orthogonality. But I claim that different goals tend to be orthogonal at high levels of technology independent of value’s fragility.
This assumes that all goods are substitutes in production, which I expect to be nearly true with mature technology.
That is, those that affect the probability of futures outside the binary or that affect how good the future is within the set of near-zero (or near-optimal) futures out of hand.