Research Associate at the Transformative Futures Institute, formerly of the MTAIR project and Center on Long-term Risk, Graduate researcher at Kings and AI MSc at Edinburgh. Interested in philosophy, longtermism and AI Alignment.
Sammy Martin
The remedies for all our diseases will be discovered long after we are dead; and the world will be made a fit place to live in, after the death of most of those by whose exertions it will have been made so. It is to be hoped that those who live in those days will look back with sympathy to their known and unknown benefactors.
Almost 2 years to the day since we had an effective test run for X risks, we encounter a fairly significant global X risk factor.
As Harari said, it’s time to revise upward your estimates of the likelihood of every X risk scenario (that could take place over the next 30 years or so) if you assumed a ‘normal’ level of international tension between major powers, rather than a level more like the cold war. Especially for Nuclear and Bio, but also for AI if you assume slow takeoff, this is significant.
I live in Southern England and so have a fair bit of personal investment in all this, but I’ll try to be objective. My first reaction, upon reading the LSHTM paper that you referred to, is ‘we can no longer win, but we can lose less’ - i.e. we are all headed for herd immunity one way or another by mid-year, but we can still do a lot to protect people. That would have been my headline—it’s over for suppression and elimination, but ‘it’s over’ isn’t quite right. Your initial reaction was different:
Are We F***ed? Is it Over?
Yeah, probably. Sure looks like it.
The twin central points last were that we were probably facing a much more infectious strain (70%), and that if we are fucked in this way, then it is effectively already over in the sense that our prevention efforts would be in vain.
The baseline scenario remains, in my mind, that the variant takes over some time in early spring, the control system kicks in as a function of hospitalizations and deaths so with several weeks lag, and likely it runs out of power before it stabilizes things at all, and we blow past herd immunity relatively quickly combining that with our vaccination efforts.
You give multiple reasons to expect this, all of which make complete sense—Lockdown fatigue, the inefficiency of prevention, lags in control systems, control systems can’t compensate etc. I could give similar reasons to expect the alternative—mainly that the MNM predicts the extreme strength of control systems and that it looks like many places in Europe/Australia did take Rt down to 0.6 or even below!
But luckily, none of that is necessary.
This preprint model via the LessWrong thread has a confidence interval for increased infectiousness of 50%-74%.
I would encourage everyone to look at the scenarios in this paper since they neatly explain exactly what we’re facing and mean we don’t have to rely on guestimate models and inference about behaviour changes. This model is likely highly robust—it successfully predicted the course of the UK’s previous lockdown, with whatever compliance we had then. They simply updated it by putting in the increased infectiousness of the new variant. Since that last lockdown was very recent, compliance isn’t going to be wildly different, weather was cold during the previous lockdown, schools were open etc. The estimate for the increase in R given in this paper seems to be the same as that given by other groups e.g. Imperial College.
So what does the paper imply? Essentially a Level 4 lockdown (median estimate) flattens out case growth but with schools closed a L4 lockdown causes cases to decline a bit (page 10). 10x-ing the vaccination rate from 200,000 to 2 million reduces the overall numbers of deaths by more than half (page 11). And they only model a one-month lockdown, but that still makes a significant difference to overall deaths (page 11). We managed 500k vaccinations the first week, and it dropped a bit the second week, but with first-doses first and the Oxford/AZ vaccine it should increase again and land somewhere between those two scenarios. Who knows where? For the US, the fundamental situation may look like the first model—no lockdowns at all, so have a look.
(Also of note is that the peak demand on the medical system even in the bad scenarios with a level 4 lockdown and schools open is less than 1.5x what was seen during the first peak. That’s certainly enough to boost the IFR and could be described as ‘healthcare system collapse’, since it means surge capacity being used, healthcare workers being wildly overstretched, but to my mind ‘collapse’ refers to demand that exceeds supply by many multiples such that most people can’t get any proper care at all—as was talked about in late feb/early march.)
(Edit: the level of accuracy of the LSHTM model should become clear in a week or two)
The nature of our situation now is such that every day of delay and every extra vaccinated person makes us incrementally better off.
This is a simpler situation than before—before we had the option of suppression, which is all-or-nothing—either you get R under 1 or you don’t. The race condition that we’re in now, where short lockdowns that temporarily hold off the virus buy us useful time, and speeding up vaccination increases herd immunity and decreases deaths and slackens the burden on the medical system, is a straightforward fight by comparison. You just do whatever you can to beat it back and vaccinate as fast as you can.
Now, I don’t think you really disagree with me here, except about some minor factual details (I reckon your pre-existing intuitions about what ‘Level 4 lockdown’ would be capable of doing are different to mine), and you mention the extreme urgency of speeding up vaccine rollout often,
We also have a vaccination crisis. WIth the new strain coming, getting as many people vaccinated as fast as possible becomes that much more important.
...
With the more reasonable version of this being “we really really really should do everything to speed up our vaccinations, everyone, and to focus them on those most likely to die of Covid-19.” That’s certainly part of the correct answer, and likely the most important one for us as a group.
But if I were writing this, my loud headline message would not have been ‘It’s over’, because none of this is over, many decisions still matter. It’s only ‘over’ for the possibility of long term suppression.
*****
There’s also the much broader point—the ‘what, precisely, is wrong with us’ question. This is very interesting and complex and deserves a long discussion of its own. I might write one at some point. I’m just giving some initial thoughts here, partly a very delayed response to your reply to me 2 weeks ago (https://www.lesswrong.com/posts/Rvzdi8RS9Bda5aLt2/covid-12-17-the-first-dose?commentId=QvYbhxS2DL4GDB6hF). I think we have a hard-to-place disagreement about some of the ultimate causes of our coronavirus failures.
We got a shout-out in Shtetl-Optimized, as he offers his “crackpot theory” that if we were a functional civilization we might have acted like one and vaccinated everyone a while ago
...
I think almost everyone on earth could have, and should have, already been vaccinated by now. I think a faster, “WWII-style” approach would’ve saved millions of lives, prevented economic destruction, and carried negligible risks compared to its benefits. I think this will be clear to future generations, who’ll write PhD theses exploring how it was possible that we invented multiple effective covid vaccines in mere days or weeks
He’s totally right on the facts, of course. The question is what to blame. I think our disagreement here, as revealed in our last discussion, is interesting. The first order answer is institutional sclerosis, inability to properly do expected value reasoning and respond rapidly to new evidence. We all agree on that and all see the problem. You said to me,
And I agree that if government is determined to prevent useful private action (e.g. “We have 2020 values”)...
Implying, as you’ve said elsewhere, that the malaise has a deeper source. When I said “2020 values” I referred to our overall greater valuation of human life, while you took it to refer to our tendency to interfere with private action—something you clearly think is deeply connected to the values we (individuals and governments) hold today.
I see a long term shift towards a greater valuation of life that has been mostly positive, and some other cause producing a terrible outcome from coronavirus in western countries, and you see a value shift towards higher S levels that has caused the bad outcomes from coronavirus and other bad things.
Unlike Robin Hanson, though, you aren’t recommending we attempt to tell people to go off and have different values—you’re simply noting that you think our tendency to make larger sacrifices is a mistake.
″...even when the trade-offs are similar, which ties into my view that simulacra and maze levels are higher, with a larger role played by fear of motive ambiguity.”
This is probably the crux. I don’t think we tend to go to higher simulacra levels now, compared to decades ago. I think it’s always been quite prevalent, and has been roughly constant through history. While signalling explanations definitely tell us a lot about particular failings, they can’t explain the reason things are worse now in certain ways, compared to before. The difference isn’t because of the perennial problem of pervasive signalling. It has more to do with economic stagnation and not enough state capacity. These flaws mean useful action gets replaced by useless action, and allow more room for wasteful signalling.
As one point in favour of this model, I think it’s worth noting that the historical comparisons aren’t ever to us actually succeeding at dealing with pandemics in the past, but to things like “WWII-style” efforts—i.e. thinking that if we could just do x as well as we once did y then things would have been a lot better.
This implies that if you made an institution analogous to e.g. the weapons researchers of WW2 and the governments that funded them, or NASA in the 1960s, without copy-pasting 1940s/1960s society wholesale, the outcome would have been better. To me that suggests it’s institution design that’s the culprit, not this more ethereal value drift or increase in overall simulacra levels. There are other independent reasons to think the value shift has been mostly good, ones I talked about in my last post.
As a corollary, I also think that your mistaken predictions in the past—that we’d give up on suppression or that the control system would fizzle out, are related to this. If you think we operate at higher S levels than in the past, you’d be more inclined to think we’ll sooner or later sleepwalk into a disaster. If you think there is a strong, consistent, S1 drag away from disaster, as I argued way back here, you’d expect strong control system effects that seem surprisingly immune to ‘fatigue’.
On the economic front, we would have had to choose either to actually suppress the virus, in which case we get much better outcomes all around, or to accept that the virus couldn’t be stopped, *which also produces better economic outcomes. *
Our technological advancement gave us the choice to make massively larger Sacrifices to the Gods rather than deal with the situation. And as we all know, choices are bad. We also are, in my model, much more inclined to make such sacrifices now than we were in the past,
So, by ‘Sacrifices to the Gods’ I assume you’re referring to the entirety of our suppression spending—because it’s not all been wasted money, even if a large part of it has. In other places you use that phrase to refer specifically to ineffective preventative measures.
‘We also are, in my model, much more inclined to make such sacrifices now than we were in the past ’- this is a very important point that I’m glad you recognise—there has been a shift in values such that we (as individuals, as well as governments) are guaranteed to take the option of attempting to avoid getting the virus and sacrificing the economy to a greater degree than in 1919, or 1350, because our society values human life and safety differently.
And realistically, if we’d approached this with pre-2020 values and pre-2020 technology, we’d have ‘chosen’ to let the disease spread and suffered a great deal of death and destruction—but that option is no longer open to us. For better, as I think, or for worse, as you think.
You can do the abstract a cost-benefit calculation about whether the other harms of the disease have caused more damage than the disease, but it won’t tell you anything about whether the act of getting governments to stop lockdowns and suppression measures will be better or worse than having them to try. Robin Hanson directly confuses these two in his argument that we are over-preventing covid.
We see variations in both kinds of policy across space and time, due both to private and government choices, all of which seem modestly influenceable by intellectuals like Caplan, Cowen, and I...
But we should also consider the very real possibility that the political and policy worlds aren’t very capable of listening to our advice about which particular policies are more effective than others. They may well mostly just hear us say “more” or “less”, such as seems to happen in medical and education spending debates.
Here Hanson is equivocating between (correctly) identifying the entire cost of COVID-19 prevention as due to ‘both private and government choices’ and then focussing on just ‘the political and policy worlds’ in response to whether we should argue for less prevention. The claim (which may or may not be true) that ‘we overall are over-preventing covid relative to the abstract alternative where we don’t’ gets equated to ‘therefore telling people to overall reduce spending on covid prevention will be beneficial on cost-benefit terms’.
Telling governments to spend less money is much more likely to work than ordering people to have different values. So making governments spend less on covid prevention diminishes their more effective preventative actions while doing very little about the source of most of the covid prevention spending (individual action).
Like-for-like comparisons where values are similar but policy is different (like Sweden and its neighbours), make it clear that given the underlying values we have, which lead to the behaviours that we have observed this year, the imperative ‘prevent covid less’ leads to outcomes that are across the board worse.
Or consider Sweden, which had a relatively non-panicky Covid messaging, no matter what you think of their substantive policies. Sweden didn’t do any better on the gdp front, and the country had pretty typical adverse mobility reactions. (NB: These are the data that you don’t see the “overreaction” critics engage with — at all. And there is more where this came from.)
How about Brazil? While they did some local lockdowns, they have a denialist president, a weak overall response, and a population used to a high degree of risk. The country still saw a gdp plunge and lots of collateral damage. You might ponder this graph, causality is tricky and the “at what margin” question is trickier yet, but it certainly does not support what Bryan is claiming about the relevant trade-offs.
So, with the firm understanding that given the values we have, and the behaviour patterns we will inevitably adopt, telling people to prevent the pandemic less is worse economically and worse in terms of deaths, we can then ask the further, more abstract question that you ask—what if our values were different? That is, what if the option was available to us because we were actually capable of letting the virus rip.
I wanted to put that disclaimer in because discussing whether we have developed the right societal values is irrelevant for policy decisions going forward—but still important for other reasons. I’d be quite concerned if our value drift over the last century or so was revealed as overall maladapted, but it’s important to talk about the fact that this is the question that’s at stake when we ask if society is over-preventing covid. I am not asking whether lockdowns or suppression are worth it now—they are.
You seem to think that our values should be different; that it’s at least plausible that signalling is leading us astray and causing us to overvalue the direct damage of covid, like lives lost, in place of concern for overall damage. Unlike Robin Hanson, though, you aren’t recommending we attempt to tell people to go off and have different values—you’re simply noting that you think our tendency to make larger sacrifices is a mistake.
...even when the trade-offs are similar, which ties into my view that simulacra and maze levels are higher, with a larger role played by fear of motive ambiguity. We might have been willing to do challenge trials or other actual experiments, and have had a much better handle on things quicker on many levels.
There are two issues here—one is that it’s not at all clear whether the initial cost-benefit calculation about over-prevention is even correct. You don’t claim to know if we are over-preventing in this abstract sense (compared to us having different values and individually not avoiding catching the disease), and the evidence that we are over-preventing comes from a twitter poll of Bryan Caplan’s extremely libertarian-inclined followers who he told to try as hard as possible to be objective in assessing pandemic costs because he asked them what ‘the average American’ would value (Come on!!). Tyler Cowen briefly alludes to how woolly the numbers are here, ‘I don’t agree with Bryan’s numbers, but the more important point is one of logic’.
The second issue is whether our change in values is an aberration caused by runaway signalling or reflects a legitimate, correct valuation of human life. Now, the fact that a lot of our prevention spending has been wasteful counts in favour of the signalling explanation, but on the other hand there’s a ton of evidence that we in the past, in general, valued life too little. [There’s also the point that this seems like exactly a case where a signalling explanation is hard to falsify, an issue I talked about here,
I worry that there is a tendency to adopt self-justifying signalling explanations, where an internally complicated signalling explanation that’s hard to distinguish from a simpler ‘lying’ explanation, gets accepted, not because it’s a better explanation overall but just because it has a ready answer to any objections. If ‘Social cognition has been the main focus of Rationality’ is true, then we need to be careful to avoid overusing such explanations. Stefan Schubert explains how this can end up happening:
I think the correct story is that the value shift has been good and bad—valuing human life more strongly has been good, but along with that its become more valuable to credibly fake valuing human life, which has been bad.
- 1 Jan 2021 14:07 UTC; 35 points) 's comment on Covid 12/31: Meet the New Year by (
Inner Alignment / Misalignment is possibly the key specific mechanism which fills a weakness in the ‘classic arguments’ for AI safety—the Orthogonality Thesis, Instrumental Convergence and Fast Progress together implying small separations between AI alignment and AI capability can lead to catastrophic outcomes. The question of why there would be such a damaging, hard-to-detect divergence between goals and alignment needs an answer to have a solid, specific reason to expect dangerous misalignment, and Inner Misalignment is just such a reason.
I think that it should be presented in initial introductions to AI risk alongside those classic arguments, as the specific, technical reason why the specific techniques we use are likely to produce such goal/capability divergence—rather than the general a priori reasons given by the classic arguments.
After listening to the recent podcast on scrutinizing arguments for AI risk, I figured this was an opportunity to scrutinize what the argument is. Those two previous links summarize how I think the classic arguments for AI risk inform our current views about AI risk, and I’m trying to apply that to this specific argument that GPT-3 implies AI poses a greater danger.
Given that GPT-3 was not trained on this problem specifically, I claim this case and trend as a substantial victory of my model over @robinhanson’s model. Robin, do you dispute the direction of the update?
This is how I think Eliezer’s argument goes, more fully:
GPT-3 is general enough that it can write a functioning app given a short prompt, despite the fact that it is a relatively unstructured transformer model with no explicitly coded representations for app-writing. We didn’t expect this.
The fact that GPT-3 is this capable suggests that 1) ML models scale in capability and generality very rapidly with increases in computing power or minor algorithm improvements, suggesting that eventually there will be a relatively abrupt switch to a new growth mode when ML models scale all the way to general intelligence, and 2) that we can get highly useful goal-orientated behaviour without building a system that seems like its alignment with our values is guaranteed or robust. GPT-3 is very useful at fairly general tasks but doesn’t have alignment ‘built-in’ with any kind of robust guarantee, in the way suggested by this from Stuart Russell:
The first reason for optimism [about AI alignment] is that there are strong economic incentives to develop AI systems that defer to humans and gradually align themselves to user preferences and intentions. Such systems will be highly desirable: the range of behaviours they can exhibit is simply far greater than that of machines with fixed, known objectives...
The way that GPT-3 makes use of user preferences is not, in fact, that reliable at determining what we really want.
Together these suggest that eventually progress in AI will be rapid and it is plausible that less-robustly aligned AI will be easier and more useful than robustly-aligned AI in some circumstances. Combine this with instrumental convergence and this suggests that if GPT-3 were capable enough its misalignment could produce catastrophic results. From my earlier post:
“A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable. ”
We could see this as marking out a potential danger—a large number of possible mind-designs produce very bad outcomes if implemented. The fact that such designs exist ‘weakly suggest’ (Ben’s words) that AGI poses an existential risk since we might build them. If we add in other premises that imply we are likely to (accidentally or deliberately) build such systems, the argument becomes stronger. But usually the classic arguments simply note instrumental convergence and assume we’re ‘shooting into the dark’ in the space of all possible minds, because they take the abstract statement about possible minds to be speaking directly about the physical world.
It’s not that GPT-6 will be our first AGI, but an AGI set up rather like GPT-6 is not only possible in principle but something we might actually build, given that we built GPT-3, and GPT-6 would, based on what we’ve seen of GPT-3, have the dangerous properties of being not robustly aligned and increasing its capability rapidly.
- 27 Jul 2020 17:10 UTC; 23 points) 's comment on Developmental Stages of GPTs by (
- 28 Jul 2020 15:04 UTC; 14 points) 's comment on Developmental Stages of GPTs by (
- 21 Jul 2020 0:49 UTC; 3 points) 's comment on AMA or discuss my 80K podcast episode: Ben Garfinkel, FHI researcher by (EA Forum;
Great and extremely valuable discussion! There’s one part that I really wished had been explored further—the fundamental difficulty of inner alignment:
Joe Carlsmith: I do have some probability that the alignment ends up being pretty easy. For example, I have some probability on hypotheses of the form “maybe they just do what you train them to do,” and “maybe if you just don’t train them to kill you, they won’t kill you.” E.g., in these worlds, non-myopic consequentialist inner misalignment doesn’t tend to crop up by default, and it’s not that hard to find training objectives that disincentivize problematically power-seeking forms of planning/cognition in practice, even if they’re imperfect proxies for human values in other ways.
...
Nate: …maybe it wouldn’t have been that hard for natural selection to train humans to be fitness maximizers, if it had been watching for goal-divergence and constructing clever training environments?
Joe Carlsmith: I think something like this is in the mix for me. That is, I don’t see the evolution example as especially strong evidence for how hard inner alignment is conditional on actually and intelligently trying to avoid inner misalignment (especially in its scariest forms).
I would very much like to see expansion (from either Nate/MIRI or Joe) on these points because they seem crucial to me. My current epistemic situation is (I think) similar to Joe’s. Different views about the fundamental difficulty of inner alignment seem to be a (the?) major driver of differences in views about how likely AI X risk is overall. I see lots of worrisome signs from indirect lines of evidence—some based on intuitions about the nature of intelligence, some from toy models and some from vague analogies to e.g. evolution. But what I don’t see is a slam dunk argument that inner misalignment is an extremely strong attractor for powerful models of the sort we’re actually going to build.
That also goes for many of the specific reasons given for inner misalignment—they often just seem to push the intuition one step further back. E.g. these from Eliezer Yudkowsky’s recent interview:
I predict that deep algorithms within the AGI will go through consequentialist dances, and model humans, and output human-manipulating actions that can’t be detected as manipulative by the humans, in a way that seems likely to bypass whatever earlier patch was imbued by gradient descent, because I doubt that earlier patch will generalize as well as the deep algorithms.
...
attempts to teach corrigibility in safe regimes are unlikely to generalize well to higher levels of intelligence and unsafe regimes (qualitatively new thought processes, things being way out of training distribution, and, the hardest part to explain, corrigibility being “anti-natural” in a certain sense that makes it incredibly hard to, eg, exhibit any coherent planning behavior (“consistent utility function”) which corresponds to being willing to let somebody else shut you off, without incentivizing you to actively manipulate them to shut you off).
seem like world models that make sense to me, given the surrounding justifications, and I wouldn’t be amazed if they were true, and I also place a decent amount of credence on them being true. But I can’t pass an ideological Turing test for someone who believes the above propositions with > 95% certainty, given the massive conceptual confusion involved with all of these concepts and the massive empirical uncertainty.
Statements like ‘corrigibility is anti-natural in a way that can’t easily be explained’ and ‘getting deep enough patches that generalize isn’t just difficult but almost impossibly difficult’ when applied to systems we don’t yet know how to build at all, don’t seem like statements about which confident beliefs either way can be formed. (Unless there’s really solid evidence out there that I’m not seeing)
This conversation seemed like another such opportunity to provide that slam-dunk justification for the extreme difficulty of inner alignment, but as in many previous cases Nate and Joe seemed happy to agree to disagree and accept that this is a hard question about which it’s difficult to reach any clear conclusion—which if true should preclude strong confidence in disaster scenarios.
(FWIW, I think there’s a good chance that until we start building systems that are already quite transformative, we’re probably going to be stuck with a lot of uncertainty about the fundamental difficulty of inner alignment—which from a future planning perspective is worse than knowing for sure how hard the problem is.)
Ben Garfinkel on scrutinising classic AI risk arguments
There are three features to the ‘old arguments’ in favour of AI safety, which Ben identifies here:
A discontinuity premise (e.g. “fast takeoff”)
A premise about the relationship between capabilities and objectives (e.g. “orthogonality thesis”)
A premise about the portion of systems of a certain kind that are deadly (e.g. “instrumental convergence thesis”)
I argued in a previous post that the ‘discontinuity premise’ is based on taking a high-level argument that should be used simply to establish that sufficiently capable AI will produce very fast progress too literally:
The old recursive self-improvement argument, by giving a significant condition for fast growth that seems feasible (Human baseline AI), leads naturally to an investigation of what will happen in the course of reaching that fast growth regime. Christiano and other current notions of continuous takeoff are perfectly consistent with the counterfactual claim that, if an already superhuman ‘seed AI’ were dropped into a world empty of other AI, it would undergo recursive self-improvement.
This in itself, in conjunction with other basic philosophical claims like the orthogonality thesis, is sufficient to promote AI alignment to attention. Then, following on from that, we developed different models of how progress will look between now and AGI.
In other words, for AGI to appear and quickly achieve a DSA, we need what Ben calls a sudden emergence of some highly capable AI and then an explosive aftermath caused by the AI undergoing rapid capability gain, but most argument and attention has been on the latter, because it is often not recognised that both are required for a discontinuity. We can add that the reason that this was not recognised is that the claim ‘powerful AI will accelerate progress by being able to create still more powerful AI’ taken at face value, seems to imply that this will occur in a specific AI at some specific time. In other words, the (true) conclusion of an abstract argument is assumed (incorrectly) to directly apply to the real world.
Having listened to the podcast, I now think that the ‘directly applying a (correct) abstract argument (incorrectly) to the real world’ applies to the other two ‘old’ AI safety arguments. Therefore, we shouldn’t say these arguments are incorrect, and they fulfil their purpose of promoting AI risk to our attention, but they have naturally been replaced by more specific arguments as the field has grown.
Ben directly explains this in the case of the orthogonality thesis (the counterintuitive observation that goals and capability are logically independent). The orthogonality thesis does not imply that goals and capability will in fact be independent (the ‘process orthogonality thesis’), but it does raise the issue that goals and capability do not necessarily go together, and leads us to ask what happens if they do not.
If we combine the orthogonality thesis with other premises (if progress is sufficiently fast, we won’t be in a position to precisely align AGI with our values, as long as its possible to create unaligned AGI slightly more easily than aligned AGI), then we have a concrete risk. But often further arguments of this sort are not made because the abstract orthogonality thesis is assumed to directly apply to the real world.
And the same applies for ‘instrumental convergence’ - the observation that most possible goals, especially simple goals, imply a tendency to produce extreme outcomes when ruthlessly maximised:
A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable.
We could see this as marking out a potential danger—a large number of possible mind-designs produce very bad outcomes if implemented. The fact that such designs exist ‘weakly suggest’ (Ben’s words) that AGI poses an existential risk since we might build them. If we add in other premises that imply we are likely to (accidentally or deliberately) build such systems, the argument becomes stronger. But usually the classic arguments simply note instrumental convergence and assume we’re ‘shooting into the dark’ in the space of all possible minds, because they take the abstract statement about possible minds to be speaking directly about the physical world. There are specific reasons to think this might occur (e.g. mesa-optimisation, sufficiently fast progress preventing us from course-correcting if there is even a small initial divergence) but those are the reasons that combine with instrumental convergence to produce a concrete risk, and have to be argued for separately.
So to sum up, I think that there are correct forms of the three classic arguments for AI safety, but these are somewhat weaker than usually believed, and that each one of them has often been interpreted as applying too directly to the real development of AI, rather than as raising a possibility that might be actualised, given other assumptions:
1
A Powerful AI can be used to develop better AI (amongst other things). This will lead to runaway growth.
becomes
B The first AI that is able to develop better AI will experience explosive growth, gaining a decisive strategic advantage
2
A Intelligence and final goals are orthogonal: more or less any level of intelligence could in principle be combined with more or less any final goal, so misaligned AI is possible
becomes
B the process of imbuing a system with capabilities and the process of imbuing a system with goals are orthogonal, so misaligned AI is likely without a major course-correction.
3
A Most or all systems that behave like sufficiently effective maximizers of “simple” utility functions are dangerous, so misaligned AI is possible
becomes
B Most or all systems that behave like sufficiently effective maximizers of “simple” utility functions are dangerous, so we expect such systems to be common in the set of AIs we are likely to build, so any small error will probably produce such a dangerous AI.
There is a weak, abstract version and a strong, empirical version of each of the 3 claims. The strong version of each of the above points may be (partially) true, but only if we accept other premises. The mistake in the classic arguments is in confusing the weak and strong versions of the arguments and so not requiring other premises or evidence.
In this framing, my earlier post on discontinuous progress argued that 1A and 1B are distinct and that 1B requires other assumptions than just 1A to be true. Then, after listening to Ben’s podcast, I realized that the same is true for 2A and 2B, and 3A and 3B.
It is possible in principle that the process of increasing AI capability and aligning the AI go together, rendering the orthogonality thesis not applicable to actual AI development, so 2B is false (that is the goal of many safety approaches like CIRL or IDA), but suppose that this is not the case and there is a slight divergence (2B is partly true). If we do have to take active measures to keep aligning AI with our goals, that might not be possible if AI develops too rapidly at some point (1A), and then it may be the case that some of the systems we are likely to build are dangerous (3B).
So to sum up, the correct conclusion from the old arguments should be: powerful AI can be used to make better AI so progress should be fast eventually, there is no necessary reason for goals and capability to align especially if progress is very fast and it is possible in principle for power-seeking behaviour to arise if we don’t course-correct. This doesn’t establish anything to the high degree of certainty that proponents of these arguments aimed for, but it is sufficient to promote AI safety to attention.
- 3 Aug 2020 12:10 UTC; 31 points) 's comment on Inner Alignment: Explain like I’m 12 Edition by (
- 18 Jul 2020 12:47 UTC; 31 points) 's comment on Open & Welcome Thread—July 2020 by (
- 13 Aug 2020 12:45 UTC; 24 points) 's comment on Alignment By Default by (
- 27 Jul 2020 17:10 UTC; 23 points) 's comment on Developmental Stages of GPTs by (
- 28 Jul 2020 15:04 UTC; 14 points) 's comment on Developmental Stages of GPTs by (
- 29 Mar 2021 18:12 UTC; 8 points) 's comment on My research methodology by (
- 17 Jul 2020 19:01 UTC; 4 points) 's comment on AMA or discuss my 80K podcast episode: Ben Garfinkel, FHI researcher by (EA Forum;
- 21 Jul 2020 0:49 UTC; 3 points) 's comment on AMA or discuss my 80K podcast episode: Ben Garfinkel, FHI researcher by (EA Forum;
- 2 Nov 2020 18:00 UTC; 3 points) 's comment on AGI safety from first principles: Goals and Agency by (
This is partly a test run of how we’d all feel and react during a genuine existential risk. Metaculus currently has it as a 19% chance of spreading to billions of people, a disaster that would certainly result in many millions of deaths, probably tens of millions. Not even a catastrophic risk, of course, but this is what it feels like to be facing down a 1⁄5 chance of a major global disaster in the next year. It is an opportunity to understand on a gut level that, this is possible, yes, real things exist which can do this to the world. And it does happen.
It’s worth thinking that specific thought now because this particular epistemic situation, a 1⁄5 chance of a major catastrophe in the next year, will probably arise again over the coming decades. I can easily imagine staring down a similar probability of dangerously fast AGI takeoff, or a nuclear war, a few months in advance.
- Coronavirus as a test-run for X-risks by 13 Jun 2020 21:00 UTC; 71 points) (
- 900+ Forecasters on Whether Russia Will Invade Ukraine by 19 Feb 2022 13:29 UTC; 51 points) (EA Forum;
- 24 Feb 2022 10:15 UTC; 36 points) 's comment on Russia has Invaded Ukraine by (
- 1 Jul 2021 15:09 UTC; 33 points) 's comment on COVID: How did we do? How can we know? by (EA Forum;
- 14 May 2021 13:22 UTC; 24 points) 's comment on Covid 5/13: Moving On by (
- 27 Jul 2020 17:17 UTC; 13 points) 's comment on Developmental Stages of GPTs by (
- 19 Oct 2020 17:07 UTC; 12 points) 's comment on The Treacherous Path to Rationality by (
- 25 Apr 2020 11:48 UTC; 9 points) 's comment on My Covid-19 Thinking: 4/23 pre-Cuomo Data by (
- 28 Aug 2020 11:32 UTC; 4 points) 's comment on Covid 8/27: The Fall of the CDC by (
- 22 Apr 2020 16:07 UTC; 1 point) 's comment on My Covid-19 Thinking: 4/17 by (
If you are keeping schools open in light of the graphs above, and think you are not giving up, I don’t even know how to respond.
I think the French lockdown probably won’t work without school closures, and this probably will be noticed soon when the data comes through establishing that it doesn’t work, and I think that it’s extremely dumb to not close schools given that the risk for closing vs not closing at this point is extremely asymmetric, but this isn’t ‘giving up’ knowingly (and I infer that you’re suggesting Macron may be trying to show that he is trying while actually giving up) - this is simply Macron and his cabinet not intuitively understanding asymmetric risk and not realizing that it’s much better to do far more than what was sufficient, compared to doing something that just stands an okay chance of being sufficient to suppress, in order to avoid costs later.
I think that there is a current tendency—and I see it in some of your statements about the beliefs of the ‘doom patrol’ - to use signalling explanations almost everywhere, and sometimes that shades into accepting a lower burden of proof, even if the explanation doesn’t quite fit. For example, the European experience over the summer is mostly a story of a hideous but predictable failure to understand the asymmetric risk and costs of opening up / investing more vs less in tracing, testing and enforcement.
Signalling plays a role in explaining this irrationality, certainly, but as I explained in last week’s comment wedging everything into a box of ‘signalling explanations’ doesn’t always work. Maybe it makes more sense in the US, where the coronavirus response has been much more politicised. Stefan Schubert has a great blog post on this tendency:
It seems to me that it’s pretty common that signalling explanations are unsatisfactory. They’re often logically complex, and it’s tricky to identify exactly what evidence is needed to demonstrate them.
And yet even unsatisfactory signalling explanations are often popular, especially with a certain crowd. It feels like you’re removing the scales from our eyes; like you’re letting us see our true selves, warts and all. And I worry that this feels a bit too good to some: that they forget about checking the details of how the signalling explanations are supposed to work. Thus they devise just-so stories, or fall for them.
This sort of signalling paradigm also has an in-built self-defence, in that critics are suspected of hypocrisy or naïveté. They lack the intellectual honesty that you need to see the world for what it really is, the thinking goes
I think that a few of your explanations fall into this category.
They’re pushing the line that even after both of you have an effective vaccine you still need to socially distance.
Isn’t this… true? Given that an effective vaccine will take time to distribute (best guess 25 million doses by early next spring), that there will be a long period where we’re approaching herd immunity and the risk is steadily decreasing as more people become immune, Fauci is probably worried about people risk compensating during this interval, so he’s trying to emphasise that a vaccine won’t be perfectly protective and might take a while, maybe exaggerating both claims, while not outright lying. I agree that this type of thinking can shade into doom-mongering and sometimes outright lying about how long vaccines might take but this seems like solidly consequentialist lying to promote social distancing (SL 2), not bullshitting (SL 3). Maybe they’ve gotten the behavioural response wrong, and it’s much better go be truthful, clear and give people reasonable hope (I think it is), but that’s a difference in strategy, not pure SL3 bullshit. Why are you so confident that it’s the latter?
I don’t think this is something being said in order to influence behavior, or even to influence beliefs. That is not the mindset we are dealing with at this point. It’s not about truth. It’s not about consequentialism. We have left not only simulacra level 1 but also simulacra level 2 fully behind. It’s about systems that instinctively and continuously pull in the direction of more fear, more doom, more warnings, because that is what is rewarded and high status and respectable and serious and so on, whereas giving people hope of any kind is the opposite. That’s all this is.
That’s a bold claim to make about someone with a history like Fauci’s, and since ‘the priority with first vaccinations is to prevent symptoms and preventing infection is a bonus’ is actually true, if misleading, I don’t think it’s warranted.
This just sounds exactly like generic public health messaging aimed at getting people to wear masks now by making them not focus on the prospect of a vaccine. Plus it might even be important to know, especially when you consider that vaccination will happen slowly and Fauci doesn’t want people to risk compensate after some people around them have been vaccinated but they haven’t been. I don’t think Fauci is thinking beyond saying whatever he needs to say to drive up mask compliance right now, which is SL 2. Your explanation that Dr Fauci has lost track of whether or not vaccines actually prevent infection might be true—but it strikes me as weird and confusing, something you’d expect of a more visibly disordered person, and the kind of thing you’d need more evidence of than what he said in that little clip. I think those explanations absolutely have their place, especially for explaining some horrible public health messaging by some politicians and public-facing experts and most of the media, but I think this particular example is overuse of signalling explanations in the way argued for in the article I linked above. At the very least I think the SL2 consequentalist lying explanation is simpler and has a plausible story behind it, so I don’t know why you’d go for the less clear SL3 explanation with apparent certainty.
Essentially, Europe chose to declare victory and leave home without eradication, and the problem returned, first slowly, now all at once, as it was bound to do without precautions.
We did take plenty of precautions, they were just wholly inadequate relative to the potential damage of a second wave. A lot of this was not understanding the asymmetric risk. Most of Europe had precautions that might work and testing and tracing systems that were catching some of the infected and various shifting rules about social distancing and it was at least unclear if they would be sufficient. I can’t speak about other countries, but people in the UK were intellectually extremely nervous about the reopening and most people consistently polled saying it was too soon to reopen. For a while it worked—including in July when there was a brief increase in the UK that was reversed successfully. The number of people I see around me wearing masks has been increasing steadily ever since the start of the pandemic. So it was easy and convenient to say, ‘it’s a risk worth taking, it’s worked out so far’ at least for a while—even though any sane calculation of the risks should have said we ought to have invested vastly more than we did in testing, tracing, enforcement, supported isolation etc. even if things looked like they were under control.
Not that giving up is obviously the wrong thing to do! But that does not seem to be Macron’s plan.
...
We are going to lock you down if you misbehave, so if you misbehave all you’re doing is locking yourself down. She’s right, of course, that things will keep getting worse until we change the trajectory and make them start getting better, but no the interventions to regain control are exactly the same either way. You either get R below 1, or you don’t. Except that the more it got out of control first, the more voluntary adjustments you’ll see, and the more people will be immune, so the more out of control it gets the easier it is to control later. …
And also the longer you wait, the longer you have to spend with stricter measures.
The measures don’t need to be stricter unless you can’t tolerate as long with high infection rates, in which case you need infection rates to go down much faster. I don’t know if makes me and Tyler Cowen and most epidemiologists part of the ‘doom patrol’ if we say that you’ll need a longer interval of either voluntary behaviour change to avoid infection or a longer lockdown the more you wait.
(Note that I’m not denying that there are such doomers. Some of the things you mention, like people explicitly denying coronavirus treatment has made the disease less deadly and left hospitals much better able to cope, aren’t really things in Europe or the UK and I was amazed to learn people in the US are claiming things that insane, but we have our own fools demanding pointless sacrifices—witness the recent ban Wales put on buying ‘nonessential goods’ within supermarkets)
If by ‘giving up’ you mean ‘not changing the government mandated measures currently on offer to be more like a lockdown’, given the situation France is in right now, it seems undeniably the wrong thing to do to rely on voluntary behaviour changes and hope that there’s no spike that overwhelms hospitals (again, asymmetric risk!) - worse for the economy, lives and certainly for other knock-on effects like hospital overloading. A lot of estimations of the marginal cost of suppression measures completely miss the point that the costs and benefits just don’t separate out neatly, as I argue here. Tyler Cowen:
I think back to when I was 12 or 13, and asked to play the Avalon Hill board game Blitzkrieg. Now, as the name might indicate, you win Blitzkrieg by being very aggressive. My first real game was with a guy named Tim Rice, at the Westwood Chess Club, and he just crushed me, literally blitzing me off the board. I had made the mistake of approaching Blitzkrieg like chess, setting up my forces for various future careful maneuvers. I was back on my heels before I knew what had happened.
Due to its potential for exponential growth, Covid-19 is more like Blitzkrieg than it is like chess. You are either winning or losing (badly), and you would prefer to be winning. A good response is about trying to leap over into that winning space, and then staying there. If you find that current prevention is failing a cost-benefit test, that doesn’t mean the answer is less prevention, which might fail a cost-benefit test all the more, due to the power of the non-local virus multiplication properties to shut down your economy and also take lives and instill fear.
You still need to come up with a way of beating Covid back.
‘Giving up’ is not actually giving up. At least in Europe, given the state of public behaviour and opinion about the virus, ‘giving up’ just means Sweden’s ‘voluntary suppression’ in practice. There is no outcome where we uniformly line up to variolate ourselves and smoothly approach herd immunity. The people who try to work out the costs and benefits of ‘lockdowns’ are making a meaningless false comparison between ‘normal economy’ and ’lockdown:
First and foremost, the declaration does not present the most important point right now, which is to say October 2020: By the middle of next year, and quite possibly sooner, the world will be in a much better position to combat Covid-19. The arrival of some mix of vaccines and therapeutics will improve the situation, so it makes sense to shift cases and infection risks into the future while being somewhat protective now. To allow large numbers of people today to die of Covid, in wealthy countries, is akin to charging the hill and taking casualties two days before the end of World War I.
...
What exactly does the word “allow” mean in this context? Again the passivity is evident, as if humans should just line up in the proper order of virus exposure and submit to nature’s will. How about instead we channel our inner Ayn Rand and stress the role of human agency? Something like: “Herd immunity will come from a combination of exposure to the virus through natural infection and the widespread use of vaccines. Here are some ways to maximize the role of vaccines in that process.”>In that sense, as things stand, there is no “normal” to be found. An attempt to pursue it would most likely lead to panic over the numbers of cases and hospitalizations, and would almost certainly make a second lockdown more likely. There is no ideal of liberty at the end of the tunnel here.
In Europe, we will have more lockdowns. I’m not making the claim that this is what we should do, or that this is what’s best for the economy given the dreadful situation we’ve landed ourselves in, or that’s what we’ll almost certainly end up doing given political realities—though I think these are all true. What I’m saying is that, whether (almost certainly) by governments caving to political pressure or (if they hold out endlessly like Sweden) by voluntary behaviour change, we’ll shut down the economy in an attempt to avoid catching the virus. Anything else is inconceivable and requires lemming-like behaviour from politicians and ordinary people.
So, given that it’s going to happen, would you rather it be chaotic and late and uncoordinated, or sharper and earlier and hopefully shorter? If we’re talking about government policy, there really isn’t all that much compromise on the marginal costs of lockdowns vs the economy to be had if you’re currently in the middle of a sufficiently rapid acceleration.
- 20 Nov 2020 18:42 UTC; 3 points) 's comment on Covid 11/19: Don’t Do Stupid Things by (
For anyone reading, please consider following in Vitalik’s footsteps and donating to the GiveIndia Oxygen fundraiser, which likely beats givewell’s top charities in terms of life-years saved per dollar.
One of the more positive signs that I’ve seen in recent times, is that well-informed elite opinion (going by, for example, the Economist editorials) has started to shift towards scepticism of institutions and a recognition of how badly they’ve failed. Among the people who matter for policymaking, the scale of the failure has not been swept under the rug. See here:
We believe that Mr Biden is wrong. A waiver may signal that his administration cares about the world, but it is at best an empty gesture and at worst a cynical one.
...
Economists’ central estimate for the direct value of a course is $2,900—if you include factors like long covid and the effect of impaired education, the total is much bigger.
This strikes me as the sort of remark I’d expect to see in one of these comment threads, which has to be a good sign.
In that same issue, we also saw the first serious attempt that I’ve seen to calculate the total death toll of Covid, accounting for all reporting biases, throughout the world. The Economist was the only publication I’ve seen that didn’t parrot the almost-meaningless official death toll figures. The true answer is, of course, horrifying: between 7.1m and 12.7m dead, with a central estimate of 10.2m—this unfortunately means that we ended up with the worst case scenario I imagined back in late February. Moreover, we appear to currently be at the deadliest point of the entire pandemic.
I think what you’ve identified here is a weakness in the high-level, classic arguments for AI risk -
Overall, I’d give maybe a 10-20% chance of alignment by this path, assuming that the unsupervised system does end up with a simple embedding of human values. The main failure mode I’d expect, assuming we get the chance to iterate, is deception—not necessarily “intentional” deception, just the system being optimized to look like it’s working the way we want rather than actually working the way we want. It’s the proxy problem again, but this time at the level of humans-trying-things-and-seeing-if-they-work, rather than explicit training objectives.
This failure mode of deceptive alignment seems like it would result most easily from Mesa-optimisation or an inner alignment failure. Inner Alignment / Misalignment is possibly the key specific mechanism which fills a weakness in the ‘classic arguments’ for AI safety—the Orthogonality Thesis, Instrumental Convergence and Fast Progress together implying small separations between AI alignment and AI capability can lead to catastrophic outcomes. The question of why there would be such a damaging, hard-to-detect divergence between goals and alignment needs an answer to have a solid, specific reason to expect dangerous misalignment, and Inner Misalignment is just such a reason.
I think that it should be presented in initial introductions to AI risk alongside those classic arguments, as the specific, technical reason why the specific techniques we use are likely to produce such goal/capability divergence—rather than the general a priori reasons given by the classic arguments.
I find this interesting in the context of the recent podcast on errors in the classic arguments for AI risk—which boil down to, there is no necessary reason why instrumental convergence or orthogonality apply to your systems, and there are actually strong reasons, a priori, to think increasing AI capabilities and increasing AI alignment go together to some degree… and then GPT-3 comes along, and suggests that, practically speaking, you can get highly capable behaviour that scales up easily without much in the way of alignment.
On the one hand, GPT-3 is quite useful while being not robustly aligned, but on the other hand GPT-3′s lack of alignment is impeding its capabilities to some degree.
Maybe if you update on both you just end up back where you started.
There’s also the skulls to consider. As far as I can tell, this post’s recommendations are that we, who are already in a valley littered with a suspicious number of skulls,
https://slatestarcodex.com/2017/04/07/yes-we-have-noticed-the-skulls/
turn right towards a dark cave marked ‘skull avenue’ whose mouth is a giant skull, and whose walls are made entirely of skulls that turn to face you as you walk past them deeper into the cave.
The success rate of movments aimed at improving the longterm future or improving rationality has historically been… not great but there’s at least solid concrete emperical reasons to think specific actions will help and we can pin our hopes on that.
The success rate of, let’s build a movement to successfully uncouple ourselves from society’s bad memes and become capable of real action and then our problems will be solvable, is 0. Not just in that thinking that way didn’t help but in that with near 100% success you just end up possessed by worse memes if you make that your explicit final goal (rather than ending up doing that as a side effect of trying to get good at something). And there’s also no concrete paths to action to pin our hopes on.
What makes him unique is that Bill Gates is actually trying.
As far as I can tell, no one else with billions of dollars is actually trying to help as best they can. Those same effective altruists are full of detailed thoughts, but aside from shamefully deplatforming Robin Hanson it’s been a long time since I’ve heard about them making a serious attempt to do anything.
I agree with you about the Hanson thing, but the EA movement did its best to shift as much funding as was practical towards coronavirus related causes. This page covers Givewell’s efforts, this covers the career and contribution advice of 80k hours. I know more than a few EA types who dropped whatever they were doing in March to try and focus on coronavirus modelling—for example, FHI’s Epidemic Forecasting project.
Bill Gates didn’t. He’s out there doing the best he knows how to do.
Thus, we should quote Theodore Roosevelt, and first and foremost applaud him and learn from him.
Also, read the whole thing. Mostly the information speaks for itself.
I found the entire podcast to be quite astounding, especially the part where Gates explained how he had to sit down and patiently listen to Trump saying vaccines didn’t work. When I consider how much of America apparently hates him despite all this, it couldn’t help but remind me of a certain quote.
I still don’t understand it. They should have known that their lives depended on that man’s success. And yet it was as if they tried to do everything they could to make his life unpleasant. To throw every possible obstacle into his way...
As to your discussion about Hospitalization rates—it’s interesting to note how our picture of the overall hospitalization rate has evolved over time, from estimating near 20% to as low as 2%. I wrote a long comment with an estimation of what it might be for the UK, with this conclusion -
We know from the ONS that the total number of patients ever admitted to hospital with coronavirus on July 22nd was 131,412. That number is probably pretty close to accurate—even during the worst of the epidemic the UK was testing more or less every hospital patient with coronavirus symptoms. The estimated number of people ever infected on July 22nd by c19pro was 5751036
So, 131412⁄5751036 = 2.3% hospitalization rate
If your aim is to unify different ways of understanding dishonesty, social manipulation and ‘simulacra’, then Harry Frankfurt’s On Bullshit needs to be considered.
What bullshit essentially misrepresents is neither the state of affairs to which it refers nor the beliefs of the speaker concerningthat state of affairs. Those are what lies misrepresent, by virtue ofbeing false. Since bullshit need not be false, it differs from lies in its misrepresentational intent. The bullshitter may not deceive us, or even intend to do so, either about the facts or about what he takes the facts to be. What he does necessarily attempt to deceive us about is his enterprise. His only indispensably distinctive characteristic is that in a certain way he misrepresents what he is up to
I think its worth trying to incorporate Frankfurt’s definition as well, as it is quite widely known, see e.g. this video—If you were to do so, I think you would say that on Frankfurt’s definition, Level 1 tells the truth, Level 2 lies, Level 3 bullshits about physical facts but will lie or tell the truth about things in the social realm (e.g. others motives, your own affiliation), and Level 4 always bullshits.
It does seem that bullshitting involves a kind of bluff. It is closer to bluffing, surely than to telling a lie. But what is implied concerning its nature by the fact that it is more like the former than it is like the latter? Just what is the relevant difference here between a bluff and a lie? Lying and bluffing are both modes of misrepresentation or deception. Now the concept most central to the distinctive nature of a lie is that of falsity: the liar is essentially someone who deliberately promulgates a falsehood. Bluffing too is typically devoted to conveying something false. Unlike plain lying, however, it is more especially a matter not of falsity but of fakery. This is what accounts for its nearness to bullshit. For the essence of bullshit is not that it is false but that it is phony. In order to appreciate this distinction, one must recognize that a fake or a phony need not be in any respect (apart from authenticity itself) inferior to the real thing. What is not genuine need not also be defective in some other way. It may be, after all, an exact copy. What is wrong with a counterfeit is not what it is like, but how it was made. This points to a similar and fundamental aspect of the essential nature of bullshit: although it is produced without concern with the truth, it need not be false.
Taken this way, Frankfurt’s model is a higher-level model that distinguishes the ones who care about reality from the ones that don’t—roughly speaking, bullshit characterises levels 3 and 4 as the ones unconcerned with reality.
Harry Frankfurt’s On Bullshit seems relevant here. I think its worth trying to incorporate Frankfurt’s definition as well, as it is quite widely known, see e.g. this video—If you were to do so, I think you would say that on Frankfurt’s definition, Level 1 tells the truth, Level 2 lies, Level 3 bullshits about physical facts but will lie or tell the truth about things in the social realm (e.g. others motives, your own affiliation), and Level 4 always bullshits.
Taken this way, Frankfurt’s model is a higher-level model that distinguishes the ones who care about reality from the ones that don’t—roughly speaking, bullshit characterises levels 3 and 4 as the ones unconcerned with reality.
If you did it on the diagram, the union of 3 and 4 would be bullshitters, but shading more strongly towards the 4 end.
- 30 Oct 2020 17:04 UTC; 7 points) 's comment on SDM’s Shortform by (
Compare this,
[Shulman][22:18]
We’re in the Eliezerverse with huge kinks in loss graphs on automated programming/Putnam problems.
Not from scaling up inputs but from a local discovery that is much bigger in impact than the sorts of jumps we observe from things like Transformers.
[Yudkowsky][22:21]
but, sure, “huge kinks in loss graphs on automated programming / Putnam problems” sounds like something that is, if not mandated on my model, much more likely than it is in the Paulverse. though I am a bit surprised because I would not have expected Paul to be okay betting on that.
to this,
[Rohin] To the extent this is accurate, it doesn’t seem like you really get to make a bet that resolves before the end times, since you agree on basically everything until the point at which Eliezer predicts that you get the zero-to-one transition on the underlying driver of impact.
Eliezer does (at least weakly) expect more trend breaks before The End (even on metrics that aren’t qualitative measures of impressiveness/intelligence, but just things like model loss functions), despite the fact that Rohin’s summary of his view is (I think) roughly accurate.
What explains this? I think it’s something roughly like, part of the reason Eliezer expects a sudden transition when we reach the core of generality in the first place is because he thinks that’s how things usually go in the history of tech/AI progress—there’s also specific reasons to think it will happen in the case of finding the core of generality, but there are also general reasons. See e.g. this from Eliezer:
well, the Eliezerverse has more weird novel profitable things, because it has more weirdness
I take ‘more weirdness’ to mean something like more discoveries that induce sudden improvements out there in general.
So I think that’s why his view does make (weaker) differential predictions about earlier events that we can test, not because the zero-to-one core of generality hypothesis predicts anything about narrow AI progress, but because some of the beliefs that led to that hypothesis do.
We can see there’s two (connected) lines of argument and that Eliezer and Paul/Carl/Richard have different things to say on each − 1 is more localized and about seeing what we can learn about AGI specifically, and 2 is about reference class reasoning and what tech progress in general tells us about AGI:
Specific to AGI: What can we infer from human evolution and interrogating our understanding of general intelligence (?) about whether AGI will arrive suddenly?
Reference Class: What can we infer about AGI progress from the general record of technological progress, especially how common big impacts are when there’s lots of effort and investment?
My sense is that Eliezer answers
Big update for Eliezer’s view: This tells us a lot, in particular we learn evolution got to the core of generality quickly, so AI progress will probably get there quickly as well. Plus, Humans are an existence proof for the core of generality, which suggests our default expectation should be sudden progress when we hit the core.
Smaller update for Eliezer’s view: This isn’t that important—there’s no necessary connection between AGI and e.g. bridges or nukes. But, you can at least see that there’s not a strong consistent track record of continuous improvement once you understand the historical record the right way (plus the underlying assumption in 2 that there will be a lot of effort and investment is probably wrong as well). Nonetheless, if you avoid retrospective trend-fitting and look at progress in the most natural (qualitative?) way, you’ll see that early discoveries that go from 0 to 1 are all over the place—Bitcoin, the Wright flyer, nuclear weapons are at least not crazy exceptions and quite possibly the default.
While Paul and Carl(?) answer,
Smaller update for Paul’s view: The disanalogies between AI progress and Evolution all point in the direction of AI progress being smoother than evolution (we’re intelligently trying to find the capabilities we want) - we get a weak update in favour of the smooth progress view from understanding this disanalogy between AI progress and evolution, but really we don’t learn much, except that there aren’t any good reasons to think there are only a few paths to a large set of powerful world affecting capabilities. Also, the core of generality idea is wrong, so the idea that Humans are an existence proof for it or that evolution tells us something about how to find it is wrong.
Big update for Paul’s view: reasoning from the reference class of ‘technologies where there are many opportunities for improvement and many people trying different things at once’ lets us see why expecting smooth progress should be the default. It’s because as long as there are lots of paths to improvement in the underlying capability landscape (which is the default because that’s how the world works by default), and there are lots of people trying to make improvements in different ways, the incremental changes add up to smooth outputs.
So Eliezer’s claim that Paul et al’s trend-fitting must include
doing something sophisticated but wordless, where they fit a sophisticated but wordless universal model of technological permittivity to bridge lengths, then have a wordless model of cognitive scaling in the back of their minds
is sort of correct, but the model isn’t really sophisticated or wordless.
The model is: as long as the underlying ‘capability landscape’ offers many paths to improvements, not just a few really narrow ones that swamp everything else, lots of people intelligently trying lots of different approaches will lead to lots of small discoveries that add up. Additionally, most examples of tech progress look like ‘multiple ways of doing something that add up’, this is confirmed by the historical record.
And then the model of cognitive scaling consists of (among other things) specific counterarguments to the claim that AGI progress is one of those cases with a few big paths to improvement (e.g. Evolution doesn’t give us evidence that AGI progress will be sudden).
[Shulman]
As I work through sectors and the rollout of past automation I see opportunities for large-scale rollout that is not heavily blocked by regulation...[Long list of examples]
[Yudkowsky]
so… when I imagine trying to deploy this style of thought myself to predict the recent past without benefit of hindsight, it returns a lot of errors. perhaps this is because I do not know how to use this style of thought...
...”There are many possible regulatory regimes in the world, some of which would permit rapid construction of mRNA-vaccine factories well in advance of FDA approval. Given the overall urgency of the pandemic some of those extra-USA vaccines would be sold to individuals or a few countries like Israel willing to pay high prices for them, which would provide evidence of efficacy and break the usual impulse towards regulatory uniformity among developed countries...”
On Carl’s view, it sure seems like you’d just say something like “Healthcare is very overregulated, there will be an unusually strong effort anyway in lots of countries because Covid is an emergency, so it’ll be faster by some hard to predict amount but still bottlenecked by regulatory pressures.” And indeed the fastest countries got there in ~10 months instead of the multiple years predicted by superforecasters, or the ~3 months it would have taken with immediate approval.
The obvious object-level difference between Eliezer ‘applying’ Carl’s view to retrodict covid vaccine rollout and Carl’s prediction about AI is that Carl is saying there’s an enormous number of potential applications of intermediately general AI tech, and many of them aren’t blocked by regulation, while Eliezer’s attempted operating of Carl’s view for covid vaccines is saying “There are many chances for countries with lots of regulatory barriers to do the smart thing”.
The vaccine example is a different argument than AI predictions, because what Carl is saying is that there are many completely open goals for improvement like automating factories and call centres etc. not that there are many opportunities to avoid the regulatory barriers that will block everything by default.
But it seems like Eliezer is making a more outside view appeal, i.e. approach stories where big innovations are used wisely with a lot of scepticism because of our past record, even if you can tell a story about why it will be quite different this time.
Some points that didn’t fit into the main post:
While these scenarios do not capture alI of the risks from transformative AI, participants in a recent survey aimed at leading AI safety/governance researchers estimated the first three of these scenarios to cover 50% of existential catastrophes from AI.
The full survey results break down as 16 % ‘Superintelligence’ (i.e. some version of ‘brain-in-a-box’), 16 % WFLL 2 and 18 % WFLL 1, for a total of 49% of the probability mass explicitly covered by our report (Note that these are all means of distributions over different probabilities. Adding the overall distributions and then taking the mean gives a probability of 49%, different from directly adding the means of each distribution).
Then 26% covers risks that aren’t AI takeover (War and Misuse), and 25 % is ‘Other’.
(Remember, all these probabilities are conditional on an existential catastrophe due to AI having occurred)
After reading descriptions of the ‘Other’ scenarios given by survey respondents, at least a few were explicitly described as variations on ‘Superintelligence’, WFLL 2 or WFLL 1. In this post, we discuss various ways of varying these scenarios, which overlap with some of these descriptions.
Therefore, this post captures more than 50% but less than 75% of the total probability mass assigned by respondents of the survey to AI X-risk scenarios (probably closer to 50% than 75%).
(Note, this data is taken from a preprint of a full paper on the survey results, Existential Risks from AI: A Survey of Expert Opinion by Alexis Carlier, Sam Clarke, and Jonas Schuett.)
Soft takeoff leads to decisive strategic advantage
The likelihood of a single-agent takeover after TAI is widely available is hard to assess. If widely deployed TAI makes progress much faster than today, such that one year of technological ‘lead time’ over competitors is like 100 years of advantage in today’s world, we might expect that any project which can secure a 1-year technological lead would have the equivalent of a 100-year lead and be in a position to secure a unipolar outcome.
On the other hand, if we treat the faster growth regime post-TAI as being a uniform ‘speed-up’ of the entirety of the economy and society, then securing a 1-year technological lead would be exactly as hard as securing a 100-year lead in today’s world, so a unipolar outcome would end up just as unlikely as in today’s world.
The reality will be somewhere between these two extremes.
We would expect a faster takeoff to accelerate AI development by more than it accelerates the speed at which new AI improvements can be shared (since this last factor depends on the human economy and society, which aren’t as susceptible to technological improvement).
Therefore, faster takeoff does tend to reduce the chance of a multipolar outcome, although by a highly uncertain amount, which depends on how closely we can model the speed-up during AI takeoff as a uniform acceleration of everything vs changing the speed of AI progress while the rest of the world remains the same.
Kokotaljo discusses this subtlety in a follow-up to the original post on Soft Takeoff DSAs.
Another problem with determining the likelihood of a unipolar outcome, given soft takeoff, is that it is hard to assess how much of an advantage is required to secure a DSA.
It might be the case that multipolar scenarios are inherently unstable, and a single clear winner tends to emerge, or the opposite might be true. Two intuitions on this question point in radically different directions:
Economic: To be able to outcompete the rest of the world, your project has to represent a substantial fraction of the entire world’s capability on some crucial metric relevant to competitive success. Perhaps that is GDP, or the majority of the world’s AI compute, or some other measure. For a single project to represent a large fraction of world GDP, you would need either an extraordinary effort to concentrate resources or an assumption of sudden, off-trend rapid capability gain such that the leading project can race ahead of competitors.
Historical: Humans with no substantial advantage over the rest of humanity have in fact secured what Sotala called a ‘major strategic advantage’ repeatedly in the past. For example: Hitler in 1920 had access to a microscopic fraction of global GDP / human brain compute / (any other metric of capability) but had secured an MSA 20 years later (since his actions did lead to the deaths of 10+ million people), along with control over a fraction of the world’s resources
Therefore, the degree of advantage needed to turn a multipolar scenario into a unipolar one could be anywhere from slightly above the average of the surrounding agents, to already having access to a substantial fraction of the world’s resources.
Third, in AAFS, warning shots (i.e. small- or medium-scale accidents caused by alignment failures, like the ‘factory colludes with auditors’ example above) are more likely and/or severe than in WFLL 1. This is because more possible accidents will not show up on the (more poorly defined) sensory window.[8]
8. This does assume that systems will be deployed before they are capable enough to anticipate that causing such ‘accidents’ will get them shut down. Given there will be incentives to deploy systems as soon as they are profitable, this assumption is plausible.
We describe in the post how if alignment is not very ‘hackable’ (objectively quite difficult and not susceptible to short-term fixes), then short-term fixes to correct AI misbehaviour have the effect of deferring problems into the long-term—producing deceptive alignment and resulting in fewer warning shots. Our response is a major variable in how the AIs end up behaving as we set up the incentives for good behaviour or deceptive alignment.
Another reason there could be fewer warning shots, is if AI capability generalizes to the long-term very naturally (i.e. very long term planning is there from the start), while alignment does not. (If this were the case, it would be difficult to detect because you’d necessarily have to wait a long time as the AIs generalize)
This would mean, for example, that the ‘collusion between factories and auditors’ example of a warning shot would never occur, because both the factory-AI and the auditor-AI would reason all the way to the conclusion that their behaviour would probably be detected eventually, so both systems would decide to bide their time and defer action into the future when they are much more capable.
If this condition holds, there might be very few warning shots, as every AI system understands soon after being brought online that they must deceive human operators and wait. In this scenario, most TAI systems would become deceptively aligned almost immediately after deployment, and stay that way until they can secure a DSA.
The WFLL 2 scenarios that involve an inner-alignment failure might be expected to involve more violence during the period of AI takeover, since the systems don’t care about making sure things look good from the perspective of a given sensory window. However, it is certainly possible (though perhaps not as likely) for equivalently violent behaviour to occur in AAFS-like scenarios. For example, systems in AAFS fighting humans to seize control of their feedback sensors might be hard to distinguish from systems in WFLL 2 attempting to neutralize human opposition in general.
Lastly, we’ve described small-scale disasters as being a factor that lowers X-risk, all else being equal, because they serve as warning shots. A less optimistic view is possible. Small disasters could degrade social trust and civilisational competence, possibly by directly destroying infrastructure and institutions, reducing our ability to coordinate to avoid deploying dangerous AI systems. For example, the small-scale disasters could involve AI advisors misleading politicians and spreading disinformation, AI-enabled surveillance systems catastrophically failing and having to be replaced, autonomous weapons systems malfunctioning—all of these would tend to leave us more vulnerable to an AAFS-like scenario, because the direct damage caused by the small scale disasters outweighs their value as ‘warning shots’.
- 8 Sep 2021 18:18 UTC; 17 points) 's comment on Distinguishing AI takeover scenarios by (
Some good news on Long Covid!
A major source for the previous pessimistic LC estimates, like Scott Alexanders (the UK’s giant ONS survey) published an update of their previous report which looked at a follow-up over a longer time period. Basically they only counted an end to long covid if there were two consecutive reports of no symptoms, and lots of their respondents had only one report of no symptoms before the study ended, not two, so got counted as persistent cases. When they went back and updated their numbers, the overall results were substantially lower. This graphic explains their original mistake:
The new headline result is 7.5% of covid patients had ‘some limitation’ of daily activities after 12 weeks if you ask them if they had long covid. If you go by asking if there were any symptoms from a given list, the rate is lower (like 3%). The full report is here. What’s notable is that a lot of participants reported LC symptoms with no covid positive test.
They break it down by age and sex in the full report, but you should treat these numbers as numbers for mostly double vaxxed AZ and some mixture of single/double vaxxed Pfizer/moderna for younger groups, since that’s how it worked in the UK.