Previously “Lanrian” on here. Research analyst at Redwood Research. Views are my own.
Feel free to DM me, email me at [my last name].[my first name]@gmail.com or send something anonymously to https://www.admonymous.co/lukas-finnveden
Previously “Lanrian” on here. Research analyst at Redwood Research. Views are my own.
Feel free to DM me, email me at [my last name].[my first name]@gmail.com or send something anonymously to https://www.admonymous.co/lukas-finnveden
Another example I would cite was the response to If Anyone Builds It, Everyone Dies by the core EA people, including among others Will MacAskill himself and also the head of CEA. This was a very clear example of PR mindset, where quite frankly a decision was made that this was a bad EA look, the moves it proposes were unstrategic, and thus the book should be thrown overboard.
FWIW, while I got a vibe like this from the head of CEA’s review, I didn’t get this vibe from Will’s review. The vibe I got from Will’s review was an interest in the arguments and whether they really supported the strong claims of the book.
And the complaints I saw about Will’s review (at least the ones that I was sympathetic to, rather than ones that seemed just very off-base) weren’t “this is insufficiently truth-seeking”. Rather, they were “this is too nitpicky, you should be putting more emphasis on stuff you agree with, because it’s important that readers understand that AI takeover risk is high and the situation isn’t currently being handled well”.
From a pure signaling perspective (the ”legibly” part of ”legibly have as little COI as possible”) there’s also a counter consideration: if someone says that there’s danger, and calls for prioritizing safety, that might be even more credible if that’s going against their financial motivations.
I don’t think this matters much for company-external comms. There, I think it’s better to just be as legibly free of COIs as possible, because listeners struggle to tell what’s actually in the company’s best interests. (I might once have thought differently, but empirically ”they just say that superintelligence might cause extinction because that’s good for business” is a very common take.)
But for company-internal comms, I can imagine that someone would be more persuasive if they could say ”look, I know this isn’t good for your equity, it’s not good for mine either. we’re in the same boat. but we gotta do what’s right”.
Goodness is (roughly) whatever stuff the memes say one should value.
Looking at that first one, the second might seem kind of silly. After all, we mostly don’t get to choose what triggers yumminess or yearning.
A lot of goodness is about what you should do rather than what you should feel yearning for. There’s less conflict there. Even if you can’t change what you feel yearning for, you can change what you do.
Anthropic’s escape clause is footnote 17 here. Conditions are that anthropic will acknowledge risks and invest significant effort in regulation that mitigates them. (Technically that doesn’t require them to say that they’re relying on the escape clause, I guess, but if think it would be pretty egregious for them to say that they technically fulfil those criteria now. I don’t expect that anyone sees themselves as relying on the escape clause atm.)
Seems false? You can violate an RSP by developing or deploying models under conditions where your current RSP says you won’t.
It’s true that some have an escape clause that allows for deployment when others are racing ahead. (And more generally you can revise the RSP.) But this requires specific actions (public revisions or maybe making it clear when the company is relying on the escape clause), it’s not that anything goes.
Certainly the track record is disappointing compared to what’s possible, and what seems like it ought to be reasonable. And the track record shows that even pretty obvious mistakes are common. And I imagine that success probability falls off worryingly quickly as success requires more foresight and allows for less trial and error. (Fwiw, I think all this is compatible with “humans trying to prevent bad events very often prevents bad events”, when quantifying over a very broad range of possible events.)
The most analogous argument that applies to us would be: Bad events are very often prevented by humans being moderately competent and successfully trying to prevent bad events.
Which is indeed a great reason to be more optimistic about the situation than if that wasn’t true. Indeed, I expect humans to put in many, many orders of magnitude more effort on alignment (and alignment evaluation) than Klurl and Trapaucius did in the story. Still unclear if it’ll be sufficient.
By such rationalizations, Klurl, you can excuse any possible example I try to bring you, to show you that by default Reality is a safe, comfortable, unchanging, unsurprising, and above all normal place! You will just say some sort of ‘filter’ is involved! Well, my position is just that, by one means or another, the fleshlings will no doubt be subjected to some similar filter
So this bit turned out to actually be a valid argument for the situation being safe. Their reality did have a track record of not being blown up by new intelligences, and there was a systematic reason for that which saved them from the fleshlings too. (Though it failed as an argument for why the fleshlings would “end up with emotions that mechanical life would find normal and unsurprising.”)
Not super reassuring for our own future though. Our reality doesn’t seem systematically safe/comfortable/unchanging/unsurprising to me.
I’d have thought that the people with fertility problems might be even better to study than the voluntarily childless ones — because there’s less of a causal connection “obsessed with their job → chooses to not have children” which seems like a major confounder to the primary object of study (“chooses to have children → less high-quality focus on job”).
My vague impression of the authors’ position is approximately that:
AIs are alien and will have different goals-on-reflection than humans
They’ll become powerseeking when they become smart enough and have enough thinking time to realize that they have different goals than humans and that this implies that they ought to take over (if they get a good opportunity.) This is within the human range of smartness.
I’m not sure what the authors think about the argument that you can get the above two properties in a regime where the AI is too dumb to hide its misalignment from you, and that this gives you a great opportunity to iterate and learn from experiment. (Maybe just that the iteration will produce an AI that’s good at hiding its scheming before one that isn’t scheming inclined at all? Or that it’ll produce one that doesn’t scheme in your test cases, but will start scheming once you give it much more time to think on its own, and you can’t afford much testing and iteration on years or decades worth of thinking.)
My impression is that the authors held similar views significantly before they started mechanize. So the explanatory model that these views are downstream of working at mechanize, and wanting to rationalize that, seems wrong to me.
I’m somewhat sympathetic to this reasoning. But I think it proves too much.
For example: If you’re very hungry and walk past someone’s fruit tree, I think there’s a reasonable ethical case that it’s ok to take some fruit if you leave them some payment, if you’re justified in believing that they’d strongly prefer the payment to having the fruit. Even in cases where you shouldn’t have taken the fruit absent being able to repay them, and where you shouldn’t have paid them absent being able to take the fruit.
I think the reason for this is related to how it’s nice to have norms along the lines of “don’t leave people on-net worse-off” (and that such norms are way easier to enforce than e.g. “behave like an optimal utilitarian, harming people when optimal and benefitting people when optimal”). And then lots of people also have some internalized ethical intuitions or ethics-adjacent desires that work along similar lines.
And in the animal welfare case, instead of trying to avoid leaving a specific person worse-off, it’s about making a class of beings on-net better-off, or making a “cause area” on-net better-off. I have some ethical intuitions (or at least ethics-adjacent desires) along these lines and think it’s reasonable to indulge them.
I thought a potential issue with wild caught fish is that other consumers would simply substitute away from wild to farmed fish, since most people don’t care much and wild caught fish supply isn’t very elastic.
But anchovies and sardines (as suggested in the post) seem like they avoid that issue since apparently there’s basically no farming of them.
I also think it’s just super reasonable to eat animal products and offset with donations — which can easily net reduce animal suffering given how good donation opportunities there are.
IMO, a big appeal of controlled takeoff is that, if successful, it slows down all of takeoff.
Whereas a global shut down, that might have happened at a time before we had great automated alignment research, and that might incidentally ban a lot of safety research as well… might just end some number of years later, whereupon we might quickly go through the remainder of takeoff, and incur similarly much risk as without the shutdown.
(Things that can cause a shutdown to end: elections or deaths swap out who rules countries, geopolitical power shifts, verification becoming harder as it becomes more plausible that ppl could invest a lot to develop and hide compute and data centers where they can’t be seen, and maybe as AI software efficiency advances using smaller scale experiments that were hard to ban.)
Successful controlled takeoff definitely seems more likely to me than ”shutdown so long that intelligence augmented humans have time to grow up”, and also more likely than ”shutdown so long that we can solve superintelligence alignment up front without having very smart models to help us or to experiment with”.
Short shutdown to do some prep before controlled takeoff seems reasonable.
Edit: I guess technically, some very mildly intelligence augmented humans (via embryo selection) are already being born, and they have a decent chance to grow up before superintelligence even without shutdown. I was thinking about intelligence augmentation that was good enough to significantly reduce x-risk. (Though I’m not sure how long people expect that to take.)
Lots of plausible mechanisms by which something could be “a little off” suggested in this Rohin comment.
This is the most compelling version of “trapped priors” I’ve seen. I agreed with Anna’s comment on the original post, but the mechanisms here make sense to me as something that would mess a lot with updating. (Though it seems different enough from the very bayes-focused analysis in the original post that I’m not sure it’s referring to the same thing.)
I think that’s true in how they refer to it.
But it’s also a bit confusing, because I don’t think they have a definition of superintelligence in the book other than “exceeds every human at almost every mental task”, so AIs that are broadly moderately superhuman ought to count.
Edit: No wait, correction:
A few pages later they say:
> We will describe it using the term “superintelligence,” meaning a mind much more capable than any human at almost every sort of steering and prediction problem — at least, those problems where there is room to substantially improve over human performance.*
Hm, you seem more pessimistic than I feel about the situation. E.g. I would’ve bet that Where I agree and disagree with Eliezer added significant value and changed some minds. Maybe you disagree, maybe you just have a higher bar for “meaningful change”.
(Where, tbc, I think your opportunity cost is very high so you should have a high bar for spending significant time writing lesswrong content — but I’m interpreting your comments as being more pessimistic than just “not worth the opportunity cost”.)
This is roughly what seems to have happened in DC, where the internal influence approach was swept away by a big Overton window shift after ChatGPT.
In what sense was the internal influence approach “swept away”?
Also, it feels pretty salient to me that the ChatGPT shift was triggered by public, accessible empirical demonstrations of capabilities being high (and social impacts of that). So in my mind that provides evidence for “groups change their mind in response to certain kinds of empirical evidence” and doesn’t really provide evidence for “groups change their mind in response to a few brave people saying what they believe and changing the overton window”.
If the conversation changed a lot causally downstream of the CAIS extinction letter or FLI pause letter, that would be better evidence for your position (though also consistent with a model that put less weight on preference cascades and model the impact more like “policymakers weren’t aware that lots of experts were concerned, this letter communicated that experts were concerned”). I don’t know to what extent this was true. (Though I liked the CAIS extinction letter a lot and certainly believe it had a good amount of impact — I just don’t know how much.)
Is this not common in politics? I thought this was a lot of what politics was about. (Having never worked in politics.)
And corporate PR campaigns too for that matter.