Lying is Cowardice, not Strategy

24 Oct 2023 13:24 UTC

LW: 33 AF: -20

(Co-written by Connor Leahy and Gabe)

We have talked to a whole bunch of people about pauses and moratoriums. Members of the AI safety community, investors, business peers, politicians, and more.

Too many claimed to pursue the following approach:

It would be great if AGI progress stopped, but that is infeasible.
Therefore, I will advocate for what I think is feasible, even if it is not ideal.
The Overton window being what it is, if I claim a belief that is too extreme, or endorse an infeasible policy proposal, people will take me less seriously on the feasible stuff.
Given this, I will be tactical in what I say, even though I will avoid stating outright lies.

Consider if this applies to you, or people close to you.

If it does, let us be clear: hiding your beliefs, in ways that predictably leads people to believe false things, is lying. This is the case regardless of your intentions, and regardless of how it feels.

Not only is it morally wrong, it makes for a terrible strategy. As it stands, the AI Safety Community itself can not coordinate to state that we should stop AGI progress right now!

Not only can it not coordinate, the AI Safety Community is defecting, by making it more costly for people who do say it to say it.

We all feel like we are working on the most important things, and that we are being pragmatic realists.

But remember: If you feel stuck in the Overton window, it is because YOU ARE the Overton window.

—

1. The AI Safety Community is making our job harder

In a saner world, all AGI progress should have already stopped. If we don’t, there’s more than a 10% chance we all die.

Many people in the AI safety community believe this, but they have not stated it publicly. Worse, they have stated different beliefs more saliently, which misdirect everyone else about what should be done, and what the AI safety community believes.

To date, in our efforts to inform, motivate and coordinate with people: People in the AI Safety Community publicly lying has been one of the biggest direct obstacles we have encountered.

The newest example of this is ”Responsible Scaling Policies”, with many AI Safety people being much more vocal about their endorsement of RSPs than their private belief that in a saner world, all AGI progress should stop right now.

Because of them, we have been told many times that we are a minority voice, and that most people in the AI Safety community (understand, Open Philanthropy adjacent) disagree that we should stop all AGI progress right now.

That actually, there is an acceptable way to continue scaling! And given that this makes things easier, if there is indeed an acceptable way to continue scaling, this is what we should do, rather than stop all AGI progress right now!

Recently, Dario Amodei (Anthropic CEO), has used the RSP to frame the moratorium position as the most extreme version of an extreme position, and this is the framing that we have seen used over and over again. ARC mirrors this in their version of the RSP proposal, describing itself as a “pragmatic middle ground” between a moratorium and doing nothing.

Obviously, all AGI Racers use this against us when we talk to people.

There are very few people that we have consistently seen publicly call for a stop to AGI progress. The clearest ones are Eliezer’s “Shut it All Down” and Nate’s “Fucking stop”.

The loudest silence is from Paul Christiano, whose RSPs are being used to safety-wash scaling.

Proving me wrong is very easy. If you do believe that, in a saner world, we would stop all AGI progress right now, you can just write this publicly.

When called out on this, most people we talk to just fumble.

2. Lying for Personal Gain

We talk to many people who publicly lie about their beliefs.

The justifications are always the same: “it doesn’t feel like lying”, “we don’t state things we do not believe”, “we are playing an inside game, so we must be tactical in what we say to gain influence and power”.

Let me call this for what it is: lying for personal gain. If you state things whose main purpose is to get people to think you believe something else, and you do so to gain more influence and power: you are lying for personal gain.

The results of this “influence and power-grabbing” has many times over materialised with the safety-washing of the AGI race. What a coincidence it is that DeepMind, OpenAI and Anthropic are all related to the AI Safety community.

The only benefit we see from this politicking is the people lying gain more influence, while the time we have left to AGI keeps getting shorter.

Consider what happens when a community rewards the people who gain more influence by lying!

—

So many people lie, and they screw not only humanity, but one another.

Many AGI corp leaders will privately state that in a saner world, AGI progress should stop, but they will not state it because it would hurt their ability to race against each other!

Safety people will lie so that they can keep ties with labs in order to “pressure them” and seem reasonable to politicians.

Whatever: they just lie to gain more power.

“DO NOT LIE PUBLICLY ABOUT GRAVE MATTERS” is a very strong baseline. If you want to defect, you need a much stronger reason than “it will benefit my personal influence, and I promise I’ll do good things with it”.

And you need to accept the blame when you’re called out. You should not muddy the waters by justifying your lies, covering them, telling people they misunderstood, and try to maintain more influence within the community.

We have seen so many people be taken in this web of lies: from politicians and journalists, to engineers and intellectuals, all up until the concerned EA or regular citizen who wants to help, but is confused by our message when it looks like the AI safety community is ok with scaling.

Your lies compound and make the world a worse place.

There is an easy way to fix this situation: we can adopt the norm of publicly stating our true beliefs about grave matters.

If you know someone who claims to believe that in a saner world we should stop all AGI progress, tell them to publicly state their beliefs, unequivocally. Very often, you’ll see them fumbling, caught in politicking. And not that rarely, you’ll see that they actually want to keep racing. In these situations, you might want to stop finding excuses for them.

3. The Spirit of Coordination

A very sad thing that we have personally felt is that it looks like many people are so tangled in these politics that they do not understand what the point of honesty even is.

Indeed, from the inside, it is not obvious that honesty is a good choice. If you are honest, publicly honest, or even adversarially honest, you just make more opponents, you have less influence, and you can help less.

This is typical deontology vs consequentialism. Should you be honest, if from your point of view, it increases the chances of doom?

The answer is YES.

a) Politicking has many more unintended consequences than expected.

Whenever you lie, you shoot potential allies at random in the back.
Whenever you lie, you make it more acceptable for people around you to lie.

b) Your behavior, especially if you are a leader, a funder or a major employee (first 10 employees, or responsible for >10% of the headcount of the org), ripples down to everyone around you.

People lower in the respectability/authority/status ranks do defer to your behavior.
People outside of these ranks look at you.
Our work toward stopping AGI progress becomes easier whenever a leader/investor/major employee at Open AI, DeepMind, Anthropic, ARC, Open Philanthropy, etc. states their beliefs about AGI progress more clearly.

c) Honesty is Great.

Existential Risks from AI are now going mainstream. Academics talk about it. Tech CEOs talk about it. You can now talk about it, not be a weirdo, and gain more allies. Polls show that even non-expert citizens express diverse opinions about super intelligence.

Consider the following timeline:

ARC & Open Philanthropy state in a press release “In a sane world, all AGI progress should stop. If we don’t, there’s more than a 10% chance we will all die.”
People at AGI labs working in the safety teams echo this message publicly.
AGI labs leaders who think this state it publicly.
We start coordinating explicitly against orgs (and groups within orgs) that race.
We coordinate on a plan whose final publicly stated goal is to get to a world state that, most of us agree is not one where humanity’s entire existence is at risk.
We publicly, relentlessly optimise for this plan, without compromising on our beliefs.

Whenever you lie for personal gain, you fuck up this timeline.

When you start being publicly honest, you will suffer a personal hit in the short term. But we truly believe that, coordinated and honest, we will have timelines much longer than any Scaling Policy will ever get us.

What links here?

Integrity in AI Governance and Advocacy by habryka (3 Nov 2023 19:52 UTC; 134 points)

Connor Leahy and Gabriel Alfour

24 Oct 2023 13:24 UTC

LW: 33 AF: -20

73 comments5 min readLW link

Deception Honesty Community AI

paulfchristiano 25 Oct 2023 2:55 UTC
LW: 131 AF: 61
72
AF
Here is a short post explaining some of my views on responsible scaling policies, regulation, and pauses I wrote it last week in response to several people asking me to write something. Hopefully this helps clear up what I believe.
I don’t think I’ve ever hidden my views about the dangers of AI or the advantages of scaling more slowly and carefully. I generally aim to give honest answers to questions and present my views straightforwardly. I often point out that catastrophic risk would be lower if we could coordinate to build AI systems later and slower; I usually caveat that doing so seems costly and politically challenging and so I expect it to require clearer evidence of risk.
ryan_greenblatt 24 Oct 2023 15:30 UTC
LW: 103 AF: 41
64
AF
I think this post is quite misleading and unnecessarily adversarial.

~~I’m not sure if I want to engage futher, I might give examples of this later.~~ (See examples below)

(COI: I often talk to and am friendly with many of the groups criticized in this post.)
- ryan_greenblatt 24 Oct 2023 15:53 UTC
  LW: 134 AF: 65
  68
  AF Parent
  Examples:
  - It seems to conflate scaling pauses (which aren’t clearly very useful) with pausing all AI related progress (hardware, algorithmic development, software). Many people think that scaling pauses aren’t clearly that useful due to overhang issues, but hardware pauses are pretty great. However, hardware development and production pauses would clearly be extremely difficult to implement. IMO the sufficient pause AI ask is more like “ask nvidia/tsmc/etc to mostly shut down” rather than “ask AGI labs to pause”.
  - More generally, the exact type of pause which would actually be better than (e.g.) well implemented RSPs is a non-trivial technical problem which makes this complex to communicate. I think this is a major reason why people don’t say stuff like “obviously, a full pause with XYZ characteristics would be better”. For instance, if I was running the US, I’d probably slow down scaling considerably, but I’d mostly be interested in implementing safety standards similar to RSPs due to lack of strong international coordination.
  - The post says “many people believe” a “pause is necessary” claim^[1], but the exact claim you state probably isn’t actually believed by the people you cite below without additional complications. Like what exact counterfactuals are you comparing? For instance, I think that well implemented RSPs required by a regulatory agency can reduce risk to <5% (partially by stopping in worlds where this appears needed). So as an example, I don’t believe a scaling pause is necessary and other interventions would probably reduce risk more (while also probably being politically easier). And, I think a naive “AI scaling pause” doesn’t reduce risk that much, certainly less than a high quality US regulatory agency which requires and reviews something like RSPs. When claiming “many people believe”, I think you should make a more precise claim that the people you name actually believe.
  - Calling something a “pragmatic middle ground” doesn’t imply that there aren’t better options (e.g., shut down the whole hardware industry).
  - For instance, I don’t think it’s “lying” when people advocate for partial reductions in nuclear arms without noting that it would be better to secure sufficient international coordination to guarantee world peace. Like world peace would be great, but idk if it’s necessary to talk about. (There is probably less common knowledge in the AI case, but I think this example mostly holds.)
  - This post says “When called out on this, most people we talk to just fumble.”. I strongly predict that the people actually mentioned in the part above this (Open Phil, Paul, ARC evals, etc) don’t actually fumble and have a reasonable response. So, I think this misleadingly conflates the responses of two different groups at best.
  - More generally, this post seems to claim people have views that I don’t actually think they have and assumes the motives for various actions are powerseeking without any evidence for this.
  - The use of the term lying seems like a case of “noncentral fallacy” to me. The post presupposes a communication/advocacy norm and states violations of this norm should be labeled “lying”. I’m not sure I’m sold on this communication norm in the first place. (Edit: I think “say the ideal thing” shouldn’t be a norm (something where we punish people who violate this), but it does seem probably good in many cases to state the ideal policy.)
  ↩︎
  The exact text from the post is:
  
  In a saner world, all AGI progress should have already stopped. If we don’t, there’s more than a 10% chance we all die.
  
  Many people in the AI safety community believe this, but they have not stated it publicly. Worse, they have stated different beliefs more saliently, which misdirect everyone else about what should be done, and what the AI safety community believes.
  - ryan_greenblatt 24 Oct 2023 16:35 UTC
    LW: 29 AF: 14
    25
    AF Parent
    The title doesn’t seem supported by the content. The post doesn’t argue that people are being cowardly or aren’t being strategic (it does argue they are incorrect and seeking power in a immoral way, but this is different).
    - Shankar Sivarajan 26 Oct 2023 14:13 UTC
      7 points
      4
      Parent
      As an aside, this seems to be a general trend: I have seen people defend misleading headlines on news articles with suggestions that the title should be judged independent of the content. I disagree.
      - Eli Tyre 26 Oct 2023 17:14 UTC
        4 points
        4
        Parent
        Well, the author of an article often doesn’t decide the the title of the post. The editor does that.
        
        So it can be the case that an author wrote a reasonable and nuanced piece, and then the editor added an outrageous click-bait headline.
        M. Y. Zuo 26 Oct 2023 19:21 UTC
        0 points
        −2
        Parent
        Yes, but the author wasn’t forced at gunpoint, presumably, to work with that particular editor. So then the question can be reframed as: why did the author choose to work with an editor that seems untrustworthy?
        Michael Levine 27 Oct 2023 23:31 UTC
        3 points
        0
        Parent
        Journalists at most news outlets do not choose which editor(s) they work with on a given story, except insofar as they choose to not quit their job. This does not feel like a fair basis on which to hold the journalist responsible for the headline chosen by their editor(s).
        M. Y. Zuo 29 Oct 2023 0:31 UTC
        1 point
        0
        Parent
        Why does it not feel like a fair basis?
        Maybe if they were deceived into thinking the editor was genuine and trustworthy, but otherwise if they knew they’re working with someone untrustworthy , and they still choose to associate their names together publicly, then obviously it impacts their credibility.
        Michael Levine 30 Oct 2023 12:59 UTC
        1 point
        2
        Parent
        Insofar as a reporter works for an outlet that habitually writes misleading headlines, that does undermine the credibility of the reporter, but that’s partly true because outlets that publish grossly misleading headlines tend to take other ethical shortcuts as well. But without that general trend or a broader assessment of an outlet’s credibility, it’s possible that an otherwise fair story would get a misleading headline through no fault of the reporter, and it would be incorrect to judge the reporter for that (as Eli says above).
  - DanielFilan 25 Oct 2023 7:27 UTC
    LW: 8 AF: 6
    5
    AF Parent
    
    For instance, if I was running the US, I’d probably slow down scaling considerably, but I’d mostly be interested in implementing safety standards similar to RSPs due to lack of strong international coordination.
    
    Surely if you were running the US, that would be a great position to try to get international coordination on policies you think are best for everyone?
    - ryan_greenblatt 25 Oct 2023 20:52 UTC
      LW: 4 AF: 3
      2
      AF Parent
      Sure, but seems reasonably likely that it would be hard to get that much international coordination.
      - DanielFilan 26 Oct 2023 15:50 UTC
        LW: 2 AF: 2
        0
        AF Parent
        Maybe—but you definitely can’t get it if you don’t even try to communicate the thing you think would be better.
  - Joe_Collman 24 Oct 2023 17:51 UTC
    LW: 8 AF: 5
    −1
    AF Parent
    [I agree with most of this, and think it’s a very useful comment; just pointing out disagreements]
    For instance, I think that well implemented RSPs required by a regulatory agency can reduce risk <5% (partially by stopping in worlds where this appears needed).
    I assume this would be a crux with Connor/Gabe (and I think I’m at least much less confident in this than you appear to be).
    We’re already in a world where stopping appears necessary.
    It’s entirely possible we all die before stopping was clearly necessary.
    What gives you confidence that RSPs would actually trigger a pause?
    If a lab is stopping for reasons that aren’t based on objective conditions in an RSP, then what did the RSP achieve?
    Absent objective tests that everyone has signed up for, a lab may well not stop, since there’ll always be the argument “Well we think that the danger is somewhat high, but it doesn’t help if only we pause”.
    It’s far from clear that we’ll get objective and sufficient conditions for safety (or even for low risk). I don’t expect us to—though it’d obviously be nice to be wrong.
    [EDIT: or rather, ones that allow scaling to continue safely—we already know sufficient conditions for safety: stopping]
    Calling something a “pragmatic middle ground” doesn’t imply that there aren’t better options
    I think the objection here is more about what is loosely suggested by the language used, and what is not said—not about logical implications. What is loosely suggested by the ARC Evals language is that it’s not sensible to aim for the more “extreme” end of things (pausing), and that this isn’t worthy of argument.
    Perhaps ARC Evals have a great argument , but they don’t make one. I think it’s fair to say that they argue the middle ground is practical. I don’t think it can be claimed they argue for pragmatic until they address both the viability of other options, and the risks of various courses. Doing a practical thing that would predictably lead to higher risk is not pragmatic.
    It’s not clear what the right course here, but making no substantive argument gives a completely incorrect impression. If they didn’t think it was the right place for such an argument, then it’d be easy to say that: that this is a complex question, that it’s unclear this course is best, and that RSPs vs Pause vs … deserves a lot more analysis.
    The post presupposes a communication/advocacy norm and states violations of this norm should be labeled “lying”. I’m not sure I’m sold on this communication norm in the first place.
    I’d agree with that, but I do think that in this case it’d be useful for people/orgs to state both a [here’s what we’d like ideally] and a [here’s what we’re currently pushing for]. I can imagine many cases where this wouldn’t hold, but I don’t see the argument here. If there is an argument, I’d like to hear it! (fine if it’s conditional on not being communicated further)
    - ryan_greenblatt 24 Oct 2023 18:05 UTC
      LW: 10 AF: 6
      1
      AF Parent
      Thanks for the response, one quick clarification in case this isn’t clear.
      
      On:
      
      For instance, I think that well implemented RSPs required by a regulatory agency can reduce risk to <5% (partially by stopping in worlds where this appears needed).
      
      I assume this would be a crux with Connor/Gabe (and I think I’m at least much less confident in this than you appear to be).
      
      It’s worth noting here that I’m responding to this passage from the text:
      
      In a saner world, all AGI progress should have already stopped. If we don’t, there’s more than a 10% chance we all die.
      
      Many people in the AI safety community believe this, but they have not stated it publicly. Worse, they have stated different beliefs more saliently, which misdirect everyone else about what should be done, and what the AI safety community believes.
      
      I’m responding to the “many people believe this” which I think implies that the groups they are critiquing believe this. I want to contest what these people believe, not what is actually true.
      
      Like many of therse people think policy interventions other than pause reduce X-risk below 10%.
      
      Maybe I think something like (numbers not well considered):
      
      P(doom) = 35%
      P(doom | scaling pause by executive order in 2024) = 25%
      P(doom | good version of regulatory agency doing something like RSP and safety arguments passed into law in 2024) = 5% (depends a ton on details and political buy in!!!)
      P(doom | full and strong international coordination around pausing all AI related progress for 10+ years which starts by pausing hardware progress and current manufacturing) = 3%
      
      Note that these numbers take into account evidential updates (e.g., probably other good stuff is happening if we have super strong internation coordination around pausing AI).
      - Joe_Collman 24 Oct 2023 19:10 UTC
        LW: 4 AF: 2
        0
        AF Parent
        Ah okay—thanks. That’s clarifying.
        Agreed that the post is at the very least not clear.
        In particular, it’s obviously not true that [if we don’t stop today, there’s more than a 10% chance we all die], and I don’t think [if we never stop, under any circumstances...] is a case many people would be considering at all.
        It’d make sense to be much clearer on the ‘this’ that “many people believe”.
        (and I hope you’re correct on P(doom)!)
    - ryan_greenblatt 24 Oct 2023 18:16 UTC
      LW: 6 AF: 3
      4
      AF Parent
      
      Calling something a “pragmatic middle ground” doesn’t imply that there aren’t better options
      
      I think the objection here is more about what is loosely suggested by the language used, and what is not said—not about logical implications. What is loosely suggested by the ARC Evals language is that it’s not sensible to aim for the more “extreme” end of things (pausing), and that this isn’t worthy of argument.
      
      Perhaps ARC Evals have a great argument , but they don’t make one. I think it’s fair to say that they argue the middle ground is practical. I don’t think it can be claimed they argue for pragmatic until they address both the viability of other options, and the risks of various courses. Doing a practical thing that would predictably lead to higher risk is not pragmatic.
      
      It’s not clear what the right course here, but making no substantive argument gives a completely incorrect impression. If they didn’t think it was the right place for such an argument, then it’d be easy to say that: that this is a complex question, that it’s unclear this course is best, and that RSPs vs Pause vs … deserves a lot more analysis.
      
      Yeah, I probably want to walk back my claim a bit. Maybe I want to say “doesn’t strongly imply”?
      
      It would have been better if ARC evals noted that the conclusion isn’t entirely obvious. It doesn’t seem like a huge error to me, but maybe I’m underestimating the ripple effects etc.
- ryan_greenblatt 24 Oct 2023 17:46 UTC
  LW: 12 AF: 6
  6
  AF Parent
  As an aside, I think it’s good for people and organizations (especially AI labs) to clearly state their views on AI risk, see e.g., my comment here. So I agree with this aspect of the post.
  
  Stating clear views on what ideal government/international policy would look like also seems good.
  
  (And I agree with a bunch of other misc specific points in the post like “we can maybe push the overton window far” and “avoiding saying true things to retain respectability in order to get more power is sketchy”.)
  
  (Edit: from a communication best practices perspective, I wish I noted where I agree in the parent comment than here.)
Eric Neyman 25 Oct 2023 7:23 UTC
67 points
5
(Conflict of interest note: I work at ARC, Paul Christiano’s org. Paul did not ask me to write this comment. I first heard about the truck (below) from him, though I later ran into it independently online.)
There is an anonymous group of people called Control AI, whose goal is to convince people to be against responsible scaling policies because they insufficiently constraint AI labs’ actions. See their Twitter account and website (~~also anonymous~~ Edit: now identifies Andrea Miotti of Conjecture as the director). (I first ran into Control AI via this tweet, which uses color-distorting visual effects to portray Anthropic CEO Dario Amodei in an unflattering light, in a way that’s reminiscent of political attack ads.)
Control AI has rented a truck that had been circling London’s Parliament Square. The truck plays a video of “Dr. Paul Christiano (Made ChatGPT Possible; Government AI adviser)” saying that there’s a 10-20% chance of an AI takeover and an overall 50% chance of doom, and of Sam Altman saying that the “bad case” of AGI is “lights out for all of us”. The back of the truck says “Responsible Scaling: No checks, No limits, No control”. The video of Paul seems to me to be an attack on Paul (but see Twitter discussion here).
I currently strongly believe that the authors of this post are either in part responsible for Control AI, or at least have been working with or in contact with Control AI. That’s because of the focus on RSPs and because both Connor Leahy and Gabriel Alfour have retweeted Control AI (which has a relatively small following).
Connor/Gabriel—if you are connected with Control AI, I think it’s important to make this clear, for a few reasons. First, if you’re trying to drive policy change, people should know who you are, at minimum so they can engage with you. Second, I think this is particularly true if the policy campaign involves attacks on people who disagree with you. And third, because I think it’s useful context for understanding this post.
Could you clarify if you have any connection (even informal) with Control AI? If you are affiliated with them, could you describe how you’re affiliated and who else is involved?
EDIT: This Guardian article confirms that Connor is (among others) responsible for Control AI.
What links here?
- Alex Mallen's comment on My guess at Conjecture’s vision: triggering a narrative bifurcation by Alexandre Variengien (8 Feb 2024 6:40 UTC; 16 points)
- peterbarnett 25 Oct 2023 18:05 UTC
  20 points
  9
  Parent
  The About Us page from the Control AI website has now been updated to say “Andrea Miotti (also working at Conjecture) is director of the campaign.” This wasn’t the case on the 18th of October.
  Thumbs up for making the connection between the organizations more transparent/clear.
- habryka 25 Oct 2023 7:39 UTC
  13 points
  7
  Parent
  
  The video of Paul seems to me to be an attack on Paul (but see Twitter discussion here).
  
  This doesn’t seem right. As the people in the Twitter discussion you link say, it seems to mostly use Paul as a legitimate source of an x-risk probability, with maybe also a bit of critique of him having nevertheless helped build chat-GPT, but neither seems like an attack in a strictly negative sense. It feels like a relatively normal news snippet or something.
  
  I feel confused about the truck. The video seems fine to me and seems kind of decent advocacy. The quotes used seem like accurate representations of what the people presented believe. The part that seems sad is that it might cause people to think the ones pictured also agree with other things that the responsible scaling website says, which seems misleading.
  
  I don’t particularly see a reason to dox the people behind the truck, though I am not totally sure. My bar against doxxing is pretty high, though I do care about people being held accountable for large scale actions they take.
  - Eric Neyman 25 Oct 2023 8:02 UTC
    5 points
    0
    Parent
    To elaborate on my feelings about the truck:
    If it is meant as an attack on Paul, then it feels pretty bad/norm-violating to me. I don’t know what general principle I endorse that makes it not okay: maybe something like “don’t attack people in a really public and flashy way unless they’re super high-profile or hold an important public office”? If you’d like I can poke at the feeling more. Seems like some people in the Twitter thread (Alex Lawsen, Neel Nanda) share the feeling.
    If I’m wrong and it’s not an attack, I still think they should have gotten Paul’s consent, and I think the fact that it might be interpreted as an attack (by people seeing the truck) is also relevant.
    (Obviously, I think the events “this is at least partially an attack on Paul” and “at least one of the authors of this post are connected to Control AI” are positively correlated, since this post is an attack on Paul. My probabilities are roughly 85% and 97%*, respectively.)
    *For a broad-ish definition of “connected to”
    I don’t particularly see a reason to dox the people behind the truck, though I am not totally sure. My bar against doxxing is pretty high, though I do care about people being held accountable for large scale actions they take.
    That’s fair. I think that it would be better for the world if Control AI were not anonymous, and I judge the group negatively for being anonymous. On the other hand, I don’t think I endorse them being doxxed. So perhaps my request to Connor and Gabriel is: please share what connection you have to Control AI, if any, and share what more information you have permission to share.
- DanielFilan 25 Oct 2023 16:07 UTC
  11 points
  17
  Parent
  
  Connor/Gabriel—if you are connected with Control AI, I think it’s important to make this clear, for a few reasons. First, if you’re trying to drive policy change, people should know who you are, at minimum so they can engage with you. Second, I think this is particularly true if the policy campaign involves attacks on people who disagree with you. And third, because I think it’s useful context for understanding this post.
  
  This seems like a general-purpose case against anonymous political speech that contains criticism (“attacks”) of the opposition. But put like that, it seems like there are lots of reasons people might want to speak anonymously (e.g. to shield themselves from unfair blowback). And your given reasons don’t seem super persuasive—you can engage with people who say they agree with the message (or do broad-ranged speech of your own), reason 2 isn’t actually a reason, and the post was plenty understandable to me without the context.
- [ ]
  [deleted]
Eric Neyman 25 Oct 2023 9:17 UTC
66 points
33
(Note: I work with Paul at ARC theory. These views are my own and Paul did not ask me to write this comment.)
I think the following norm of civil discourse is super important: do not accuse someone of acting in bad faith, unless you have really strong evidence. An accusation of bad faith makes it basically impossible to proceed with discussion and seek truth together, because if you’re treating someone’s words as a calculated move in furtherance of their personal agenda, then you can’t take those words at face value.
I believe that this post violates this norm pretty egregiously. It begins by saying that hiding your beliefs “is lying”. I’m pretty confident that the sort of belif-hiding being discussed in the post is not something most people would label “lying” (see Ryan’s comment), and it definitely isn’t a central example of lying. (And so in effect it labels a particular behavior “lying” in an attempt to associate it with behaviors generally considered worse.)
The post then confidently asserts that Paul Christiano hides his beliefs in order to promote RSPs. This post presents very little evidence presented that this is what’s going on, and Paul’s account seems consistent with the facts (and I believe him).
So in effect, it accuses Paul and others of lying, cowardice, and bad faith on what I consider to be very little evidence.
Edited to add: What should the authors have done instead? I think they should have engaged in a public dialogue with one or more of the people they call out / believe to be acting dishonestly. The first line of the dialogue should maybe have been: “I believe you have been hiding your beliefs, for [reasons]. I think this is really bad, for [reasons]. I’d like to hear your perspective.”
- Jiro 3 Nov 2023 20:19 UTC
  5 points
  0
  Parent
  
  It begins by saying that hiding your beliefs “is lying”. I’m pretty confident that the sort of belif-hiding being discussed in the post is not something most people would label “lying”
  
  Hiding your beliefs in ways that predictably leads people to believe false things is lying.
307th 24 Oct 2023 21:02 UTC
66 points
18
I believe you’re wrong on your model of AI risk and you have abandoned the niceness/civilization norms that act to protect you from the downside of having false beliefs and help you navigate your way out of them. When people explain why they disagree with you, you accuse them of lying for personal gain rather than introspect about their arguments deeply enough to get your way out of the hole you’re in.

First, this is a minor point where you’re wrong, but it’s also a sufficiently obvious point that it should hopefully make clear how wrong your world model is: AI safety community in general, and DeepMind + Anthropic + OpenAI in particular, have all made your job FAR easier. This should be extremely obvious upon reflection, so I’d like you to ask yourself how on earth you ever thought otherwise. CEOs of leading AI companies publicly acknowledging AI risk has been absolutely massive for public awareness of AI risk and its credibility. You regularly bring up how CEOs of leading AI companies acknowledge AI risk as a talking point, so I’d hope that on some level you’re aware that your success in public advocacy would be massively reduced in the counterfactual case where the leading AI orgs are Google Brain, Meta, and NVIDIA, and their leaders were saying “AI risk? Sounds like sci-fi nonsense!”

The fact that people disagree with your preferred method of reducing AI risk does not mean that they are EVIL LIARS who are MAKING YOUR JOB HARDER and DOOMING US ALL.

Second, the reason that a total stop is portrayed as an extreme position is because it is. You can think a total stop is correct while acknowledging that it is obviously an extreme course of action that would require TREMENDOUS international co-ordination and would have to last across multiple different governments. You would need both Republicans and Democrats in America behind it, because both will be in power across the duration of your indefinite stop, and ditto for the leadership of every other country. It would require military action to be taken against people who violate the agreement. This total stop would not just impact AI, because you would need insanely strong regulations on compute—it would impact everyone’s day to day life. The level of compute you’d have to restrict would only escalate as time went on due to Moore’s law. And you and others talk about carrying this on for decades. This is an incredibly extreme position that requires pretty much everyone in the world to agree AI risk is both real and imminent, which they don’t. Leading to...

Third: most people—both AI researchers and the general public—are not seriously concerned about AI risk. No, I don’t believe your handful of sketchy polls. On the research side, whether it’s on the machine learning subreddit, on ML specific discords, or within Yoshua Bengio’s own research organization^[1], the consensus in any area that isn’t specifically selected for worrying about AI risk is always that it’s not a serious concern. And on the public side, hopefully everyone realizes that awareness & agreement on AI risk is far below where climate change is.
Your advocacy regularly assumes that there is a broad consensus among both researchers and the public that AI risk is a serious concern. Which makes sense because this is the only way you can think a total stop is at all plausible. But bad news: there is nowhere close to such a consensus. And if you think developing one is important, you should wake up every morning & end every day praising Sam Altman, Dario Amodei, and Demis Hassabis for raising the profile of AI risk to such an extent; but instead you attack them, out of a misguided belief that somehow, if not for them, AI progress wouldn’t happen.

Which leads us to number four: No, you can’t get a total stop on AI progress through individual withdrawal. You and others in the stop AI movement regularly use the premise that if only OpenAI + Anthropic + DeepMind would just stop, AI would never get developed and we could all live happily ever after, so therefore they are KILLING US ALL.

This is false. Actually, there are many people and organizations that do not believe AI risk is a serious concern and only see AI as a technology with massive potential economic benefits; as long as this is the case, AI progress will continue. This is not a prisoner’s dilemma where if only all the people worried about AI risk would “co-operate” (by ceasing AI work) AI would stop. Even if they all stopped tomorrow, progress would continue.

If you want to say they should stop anyway because that would slow timelines, I would like to point out that that is completely different from a total stop and cannot be justified by praising the virtues of a total stop. Moreover, it has the absolutely massive drawback that now AI is getting built by a group of people who were selected for not caring about AI risk.

Advocating for individual withdrawal by talking about how good a total, globally agreed upon stop would be is deceptive—or, if I wanted to use your phrasing, I could say that doing so is LYING, presumably FOR PERSONAL GAIN and you’re going to GET US ALL KILLED you EVIL PERSON. Or I guess I could just not do all that and just explain why I disagree with you—I wonder which method is better?

Fifth, you can’t get a total stop on AI progress at all, and that’s why no one will advocate for one. This follows from points two and three and four. Even if somehow everyone agreed that AI risk was a serious issue a total stop would still not happen the same way that people believing in climate change did not cause us to abandon gasoline.
Sixth, if you want to advocate for a total stop, that’s your prerogative, but you don’t get to choose that that’s the only way. In theory there is nothing wrong with advocating for a total stop even though it is completely doomed. After all, nothing will come of it and maybe you’ll raise awareness of AI risk while you’re doing it.
The problem is that you are dead set on torching other alignment plans to the ground all for the sake of your nonworkable idea. Obviously you are going after AI capabilities people all the time but here you are also going against people who simply advocate for positions less stringent than you. Everyone needs to fall in line and advocate for your particular line of action that will never happen and if they don’t they are liars and going to kill us all. This is where your abdication from normal conversational norms makes your wrong beliefs actively harmful.
Leading to point number seven, we should talk about AI risk without constantly accusing each other of killing us all. What? But if I believe Connor’s actions are bad for AI risk surely that means I should be honest and say he’s killing us all, right? No, the same conversational norms that work for discussing a tax reform apply just as much here. You’re more likely to get a good tax reform if you talk it out in a civil manner, and the same goes for AI risk. I reject the idea that being hysterical and making drastic accusations actually helps things, I reject the idea that the long term thinking and planning that works best for literally every other issue suddenly has to be abandoned in AI risk because the stakes are so high, I reject the idea that the only possible solution is paralysis.

Eighth, yes, working in AI capabilities is absolutely a reasonable alignment plan that raises odds of success immensely. I know, you’re so overconfident on this point that even reading this will trigger you to dismiss my comment. And yet it’s still true—and what’s more, obviously so. I don’t know how you and others egged each other into the position that it doesn’t matter whether the people working on AI care about AI risk, but it’s insane.
1. ^
  From a recent interview:
  
  D’Agostino: How did your colleagues at Mila react to your reckoning about your life’s work?
  Bengio:The most frequent reaction here at Mila was from people who were mostly worried about the current harms of AI—issues related to discrimination and human rights. They were afraid that talking about these future, science-fiction-sounding risks would detract from the discussion of the injustice that is going on—the concentration of power and the lack of diversity and of voice for minorities or people in other countries that are on the receiving end of whatever we do.
- Neel Nanda 25 Oct 2023 9:13 UTC
  23 points
  18
  Parent
  
  Eighth, yes, working in AI capabilities is absolutely a reasonable alignment plan that raises odds of success immensely. I know, you’re so overconfident on this point that even reading this will trigger you to dismiss my comment. And yet it’s still true—and what’s more, obviously so. I don’t know how you and others egged each other into the position that it doesn’t matter whether the people working on AI care about AI risk, but it’s insane.
  
  I agreed with most of your comment until this line. Is your argument that, there’s a lot of nuance to getting safety right, we’re plausibly in a world where alignment is hard but possible, and the makers of AGI deeply caring about alignment, being cautious and not racing, etc, could push us over the line of getting alignment to work? I think this argument seems pretty reasonable, but that you’re overstating the case here and that this strategy could easily be net bad if you advance capabilities a lot. And “alignment is basically impossible unless something dramatically changes” also seems like a defensible position to me
  - 307th 25 Oct 2023 11:22 UTC
    4 points
    0
    Parent
    I don’t expect most people to agree with that point, but I do believe it. It ends up depending on a lot of premises, so expanding on my view there in full would be a whole post of its own. But to try to give a short version:
    
    There are a lot of specific reasons I think having people working in AI capabilities is so strongly +EV. But I don’t expect people to agree with those specific views. The reason I think it’s obvious is that even when I make massive concessions to the anti-capabilities people, these organizations… still seem +EV? Let’s make a bunch of concessions:
    
    1. Alignment will be solved by theoretical work unrelated to capabilities. It can be done just as well at an alignment-only organization with limited funding as it can at a major AGI org with far more funding.
    
    2. If alignment is solved, that automatically means future ASI will be built using this alignment technique, regardless of whether leading AI orgs actually care about alignment at all. You just publish a paper saying “alignment solution, pls use this Meta” and Meta will definitely do it.
    
    3. Alignment will take a significant amount of time—probably decades.
    
    4. ASI is now imminent; these orgs have reduced timelines to ASI by 1-5 years.
    
    5. Our best chance of survival is a total stop, which none of the CEOs of these orgs support.
    
    Even given all five of these premises… Demis Hassabis, Dario Amodei, and Sam Altman have all increased the chance of a total stop, by a lot. By more than almost anyone else on the planet, in fact. Yes, even though they don’t think it’s a good idea right now and have said as much (I think? haven’t followed all of their statements on AI pause).
    
    That is, the chance of a total stop is clearly higher in this world than in the counterfactual one where any of Demis/Dario/Sam didn’t go into AI capabilities, because a CEO of a leading AI organization saying “yeah I think AI could maybe kill us all” is something that by default would not happen. As I said before, most people in the field of AI don’t take AI risk seriously; this was even more true back when they first entered the field. The default scenario is one where people at NVIDIA and Google Brain and Meta are reassuring the public that AI risk isn’t real.
    
    So in other words, they are still increasing our chances of survival, even under that incredibly uncharitable set of assumptions.
    
    Of course, you could cook these assumptions even more in order to make them -EV—if you think that a total stop isn’t feasible, but still believe all of the other four premises, then they’re -EV. Or you could say “yeah, we need a total stop now, because they’ve advanced timelines, but if these orgs didn’t exist then we totally would have solved alignment before Meta made a big transformer model and trained it on a lot of text; so even though they’ve raised the chances of a total stop they’re still a net negative.” Or you could say “the real counterfactual about Sam Altman isn’t if he didn’t enter the field. The real counterfactual is the one where he totally agreed with all of my incredibly specific views and acted based on those.”
    
    I.e. if you’re looking for excuses to be allowed to believe that these orgs are bad, you’ll find them. But that’s always the case. Under real worldviews—even under Connor’s worldview, where he thinks a total stop is both plausible and necessary—OAI/DM/Anthropic are all helping with AI risk. Which means that their beneficiality is incredibly robust, because again, I think many of the assumptions I outlined above are false & incredibly uncharitable to AGI orgs.
    - rotatingpaguro 25 Oct 2023 22:35 UTC
      3 points
      0
      Parent
      That is, the chance of a total stop is clearly higher in this world than in the counterfactual one where any of Demis/Dario/Sam didn’t go into AI capabilities, because a CEO of a leading AI organization saying “yeah I think AI could maybe kill us all” is something that by default would not happen. As I said before, most people in the field of AI don’t take AI risk seriously; this was even more true back when they first entered the field. The default scenario is one where people at NVIDIA and Google Brain and Meta are reassuring the public that AI risk isn’t real.
      I have the impression that the big guys started taking AI risk seriously when they saw capabilities that impressed them. So I expect that if Musk, Altman & the rest of the Dreamgrove did not embark in pushing the frontier faster than it was moving otherwise, at the same capability level AI researchers would have taken it seriously the same. Famous AI scientists already knew about the AI risk arguments; where OpenAI made a difference was not in telling them about AI risk, but shoving GPT up their nose.
      I think the public would then have been able to side with Distinguished Serious People raising warnings about the dangers of ultra-intellingent machines even if Big Corp claimed otherwise.
- rotatingpaguro 24 Oct 2023 21:28 UTC
  13 points
  2
  Parent
  First, this is a minor point where you’re wrong, but it’s also a sufficiently obvious point that it should hopefully make clear how wrong your world model is: AI safety community in general, and DeepMind + Anthropic + OpenAI in particular, have all made your job FAR easier. This should be extremely obvious upon reflection, so I’d like you to ask yourself how on earth you ever thought otherwise. CEOs of leading AI companies publicly acknowledging AI risk has been absolutely massive for public awareness of AI risk and its credibility. You regularly bring up how CEOs of leading AI companies acknowledge AI risk as a talking point, so I’d hope that on some level you’re aware that your success in public advocacy would be massively reduced in the counterfactual case where the leading AI orgs are Google Brain, Meta, and NVIDIA, and their leaders were saying “AI risk? Sounds like sci-fi nonsense!”
  
  The fact that people disagree with your preferred method of reducing AI risk does not mean that they are EVIL LIARS who are MAKING YOUR JOB HARDER and DOOMING US ALL.
  I disagree this is obviously wrong. I think you are not considering the correct counterfactual. From Connor L.’s point of view, the guys at the AI labs are genuinely worried about existential risk, but run 4D chess algorithms to determine they have to send mixed signals about it. Since Connor thinks these decisions run counter to the goal, counterfactually they are making his life harder by not just stating their worry and its consequence clearly. The counterfactual is not with “if AI labs did not exist”. That said, I’m not so confident I understand what he’s thinking, but you are excluding a reasonable possibility and so it’s not <obvious> as you say.
  Overall, I think your comment is one of those cases where you indulge in the same sin you want to point out. See e.g. your overconfident epilogue.
  - 307th 24 Oct 2023 21:39 UTC
    14 points
    14
    Parent
    Yeah, fair enough.
    
    But I don’t think that would be a sensible position. The correct counterfactual is in fact the one where Google Brain, Meta, and NVIDIA led the field. Like, if DM + OpenAI + Anthropic didn’t exist—something he has publicly wished for—that is in fact the most likely situation we would find. We certainly wouldn’t find CEOs who advocate for a total stop on AI.
- 307th 24 Oct 2023 21:27 UTC
  2 points
  1
  Parent
  (Ninth, I am aware of the irony of calling for more civil discourse in a highly inflammatory comment. Mea culpa)
Eli Tyre 24 Oct 2023 20:54 UTC
LW: 49 AF: 19
12
AF
Man, I agree with almost all the content of this post, but dispute the framing. This seems like maybe an opportunity to write up some related thoughts about transparency in the x-risk ecosystem.
A few months ago, I had opportunity to talk with a number of EA-aligned or x-risk concerned folks working in policy or policy adjacent roles as part of a grant evaluation process. My views here are informed by those conversations, but I am overall quite far from the action of AI policy stuff. I try to carefully flag my epistemic state regarding the claims below.
Omission
I think a lot of people, especially in AI governance, are…
1. Saying things that they think are true
2. while leaving out other important things that they think are true, but are also so extreme or weird-sounding that they would lose credibility.
A central example is promoting regulations on frontier AI systems because powerful AI systems could develop bio-weapons that could be misused to wipe out large swaths of humanity.
I think that most of the people promoting that policy agenda with that argumentation, do in fact think that AI-developed bioweapons are a real risk of the next 15 years. And, I guess, many to most of them think that there is also a risk of an AI takeover (including one that results in human extinction), within a similar timeframe. They’re in fact more concerned about the AI takeover risk, but they’re focusing on the bio-weapons misuse case, because that’s more defensible, and (they think) easier to get others to take seriously.^[1] So they’re more likely to succed in getting their agenda passed into law, if they focus on those more-plausible sounding risks.
This is not, according to me, a lie. They are not making up the danger of AI-designed bio-weapons. And it is normal, in politics, to not say many things that you think and believe. If a person was asked point-blank about the risk AI takeover, and they gave an answer that implied the risk was lower than they think it is, in private, I would consider that a lie. But failing to volunteer that info when you’re not being asked for it is something different.^[2]
However, I do find this dynamic of saying defensible things in the overton window, and leaving out your more extreme beliefs, concerning.

It is on the table that we will have Superintelligence radically transforming planet earth by 2028. And government actors who might be able to take action on that now, are talking to advisors who do think that that kind of radical transformation is possible in that near a time frame. But those advisors hold back from telling the government actors that they think that, because they expect to loose the credibility they have.
This sure looks sus to me. It sure seems like a sad world where almost all of the people that were in a position to give a serious warning to the people in power, opted not to, and so the people in power didn’t take the threat seriously until it was too late.

But is is important to keep in mind that the people I criticizing are much closer to the relevant action than I am. They may just be straightforwardly correct that they will be discredited if they talk about Superintelligence in the near term.
I would be pretty surprised by that, given that eg OpenAI is talking about Superintelligence in the near team. And it overall becomes a lot less weird to talk about if 50 people from FHI, OpenPhil, the labs, etc. are openly saying that they think the risk of human extinction is >10%, instead of just that weird, longstanding kooky-guy, Eliezer Yudkowsky.

And it seems like if you loose credibility for soothsaying, and then you’re soothsaying looks like it’s coming true, you will earn your credibility back later? I don’t know if that’s actually how it works in politics.

But I’m not an expert here. I’ve heard at least one second hand anecdote of an EA in DC “coming out” as seriously concerned about Superintelligence and AI takeover risk, and loosing points for doing so.
And overall, I have 1000x less experience engaging with government than these people, who have specialized in this kind of thing. I suspect that they’re pretty calibrated about how different classes of people will react.

I am personally not sure how to balance advocating for a policy that seems more sensible and higher integrity to me, on my inside view, with taking into account the expertise of the people in these positions. For the time being, I’m trying to be transparent that my inside view wishes that EA policy people should be much more transparent about what they think, while also not punishing those people for following a different standard.
Belief-suppression
However, it gets worse than that. It’s not only that many policy folks are not expressing their full beliefs, I think they’re further exerting pressure on others not to express their full beliefs.
When I talk to EA people working in policy, about new people entering the advocacy space, they almost universally express some level of concern, due to “poisoning the well” dynamics.
To lay out an example of poisoning the well:

Let’s say that some young EAs are excited about the opportunities to influence AI policy. They show up in DC and manage to schedule meetings with staffers. The talk about AI and AI risk, and maybe advocate for some specific policy like a licensing regime.
But they’re amateurs. They don’t really know what they’re doing, and they commit a bunch of faux pas, revealing that they don’t know important facts about the relevant collations in congress, or which kinds of things are at all politically feasible. The staffers mark these people as unserious fools, who don’t know what they’re talking about, and who wasted their time. They disregard whatever proposal was put forward as un-serious. (The staffer doesn’t let on about this though. Standard practice is to act polite, and then laugh about the meeting with your peers over drinks.)
Then, 6 months later, a different, more established advocacy group or think tank comes forward with a very similar policy. But they’re now fighting an uphill battle, since people in government have already formed associations with that policy, and with the general worldview
As near as I can tell, I think this poisoning the well effect is real.
People in government are overwhelmed with ideas, policies, and decisions. They don’t have time to read the full reports, and often make relatively quick judgments.
And furthermore, they’re used to reasoning according to a coalition logic. to get legislation passed is not just a matter of whether it is a good idea, but largely depends on social context of the legislation. Who an idea is associated with is a strong determinant of whether to take it seriously. ^[3]
But this dynamic causes some established EA DC policy people to be wary of new people entering the space unless they already have a lot of policy experience, such that they can avoid making those kinds of faux pas. They would prefer that anyone entering the space have high levels of native social tact, and additionally to be familiar with DC etiquette.
I don’t know this to be the case, but I wouldn’t be surprised if, people’s sense of “DC etiquette” includes not talking about or not focusing too much on extreme, Sci-fi sounding scenarios.” I would guess that there’s one person working in the policy space can mess things up for everyone else in that space, and so that creates a kind of conformity pressure whereby everyone expresses the same sorts of thing.
To be clear, I know that that isn’t happening universally. There’s at least one person that I talked to, working at org X, who suggested the opposite approach—they wanted a new advocacy org to explicitly not try to the sync their messaging with org X. They thought it made more sense for different groups, especially if they had different beliefs about what’s necessary for a good future, to advocate for different policies.

But I my guess is that there’s a lot of this kind of thing, where there’s a social pressure, amongst EA policy people, toward revealing less of one’s private beliefs, lest one be seen as something of a loose cannon.

Even insofar as my inside view is mistaken about how productive it would be to say, straightforwardly, there’s an additional question of how well-coordinated this kind of policy should be. My guess, is that by trying to all stay within the overton window, the EA policy ecosystem as a whole is preventing the overton window from shifting, and it would be better if there were less social pressure towards conformity, to enable more cascading social updates.
1. ^
  I’m sure that some of those folks would deny that they’re more concerned about AI takeover risks. Some of them would claim something like agnosticism about which risks are biggest.
2. ^
  That said, my guess is that many of the people that I’m thinking of, in these policy positions, if they were asked, point blank, might lie in exactly that way. I have no specific evidence of that, but it does seem like the most likely way many of them would respond, given their overall policy about communicating their beliefs.
  
  I think that kind of lying is very bad, both misleading the person or people who are seeking info from you and a defection on our collective discourse commons by making it harder for everyone who agrees with you to say what is true.
  
  And anyone who might be tempted to lie in a situation like that should take some time in advance to think through how they could respond in a way that is both an honest representation of their actual beliefs and also not disruptive to their profesional and political commitments.
3. ^
  And there are common knowledge effects here. Maybe some bumbling fools present a policy to you. You happen to have the ability to assess that their policy proposal is actually a really good idea. But you know that the bumbling fools also presented to a number of your colleagues, who are now snickering at how dumb and non-savvy they were.
- habryka 24 Oct 2023 21:46 UTC
  LW: 37 AF: 14
  8
  AF Parent
  If a person was asked point-blank about the risk AI takeover, and they gave an answer that implied the risk was lower than they think it is, in private, I would consider that a lie
  [...]
  That said, my guess is that many of the people that I’m thinking of, in these policy positions, if they were asked, point blank, might lie in exactly that way. I have no specific evidence of that, but it does seem like the most likely way many of them would respond, given their overall policy about communicating their beliefs.
  As a relevant piece of evidence here, Jason Matheny, when asked point-blank in a senate committee hearing about “how concerned should we be about catastrophic risks from AI?” responded with “I don’t know”, which seems like it qualifies as a lie by the standard you set here (which, to be clear, I don’t super agree with and my intention here is partially to poke holes in your definition of a lie, while also sharing object-level relevant information).
  See this video 1:39:00 to 1:43:00: https://www.hsgac.senate.gov/hearings/artificial-intelligence-risks-and-opportunities/
  Quote (slightly paraphrased because transcription is hard):
  Senator Peters: “The last question before we close. We’ve heard thoughts from various experts about the risk of human-like artificial intelligence or Artificial General Intelligence, including various catastrophic projections. So my final question is, what is the risk that Artificial General Intelligence poses, and how likely is that to matter in the near future?”
  [...]
  Matheny: “As is typically my last words: I don’t know. I think it’s a really difficult question. I think whether AGI is nearer or farther than thought, I think there are things we can do today in either case. Including regulatory frameworks that include standards with third party tests and audits, governance of supply chains so we can understand where large amounts of computing is going, and so that we can prevent large amounts of computing going to places with lower ethical standards that we and other democracies have”
  Given my best model of Matheny’s beliefs, this sure does not seem like an answer that accurately summarizes his beliefs here, and represents a kind of response that I think causes people to be quite miscalibrated about the beliefs of experts in the field.
  In my experience people raise the hypothetical of “but they would be honest when asked point blank” to argue that people working in the space are not being deceptive. However, I have now seen people being asked point blank, and I haven’t seen them be more honest than their original evasiveness implied, so I think this should substantially increase people’s priors on people doing something more deceptive here.
  Jason Matheny is approximately the most powerful person in the AI policy space. I think he is setting a precedent here for making statements that meet at least the definition of lying you set out in your comment (I am still unsure whether to count that as lying, though it sure doesn’t feel honest), and if-anything, if I talk to people in the field, Matheny is generally known as being among the more open and honest people in the space.
  What links here?
  - Integrity in AI Governance and Advocacy by habryka (3 Nov 2023 19:52 UTC; 134 points)
  - Eli Tyre 24 Oct 2023 22:59 UTC
    11 points
    10
    Parent
    If his beliefs are what I would have expected them to be (eg something like “agrees with the basic arguments laid out in Superintelligence, and was motivated to follow his current carrer trajectory by those arguments”), then this answer is at best, misleading and misrepresentation of his actual models.
    Seeing this particular example, I’m on the fence about whether to call it a “lie”. He was asked about the state of the world, not about his personal estimates, and he answered in a way that was more about the state of knowable public knowledge rather than his personal estimate. But I agree that seems pretty hair-splitting.
    
    As it is, I notice that I’m confused.
    Why wouldn’t he say something to the effect of the following?
    I don’t know; this kind of forecasting is very difficult, timelines forecasting is very difficult. I can’t speak with confidence one way or the other. However, my best guess from following the literature on this topic for many years is that the catastrophic concerns are credible. I don’t know how probable it is, but does not seem to me that it is merely outlandish sci fi scenario that AI will lead to human extinction, and is not out of the question that that will happen in the next 10 years.
    That doesn’t just seem more transparent, and more cooperative with the questioner, it also seems...like an obvious strategic move?
    Does he not, in fact, by the basic arguments in Superingelligence? Is there some etiquette that he feels that he shouldn’t say that?
    
    What’s missing from my understanding here?
  - Arthur Conmy 24 Oct 2023 22:31 UTC
    11 points
    1
    Parent
    I think your interpretation is fairly uncharitable. If you have further examples of this deceptive pattern from those sympathetic to AI risk I would change my perspective but the speculation in the post plus this example weren’t compelling:
    I watched the video and firstly Senator Peters seems to trail off after the quoted part and ends his question by saying “What’s your assessment of how fast this is going and when do you think we may be faced with those more challenging issues?”. So straightforwardly his question is about timelines not about risk as you frame it. Indeed Matheny (after two minutes) literally responds “it’s a really difficult question. I think whether AGI is nearer or farther than thought …” (emphasis different to yours) so makes it likely to me Matheny is expressing uncertainty about timelines, not risk.
    Overall I agree that this was an opportunity for Matheny to discuss AI x-risk and plausibly it wasn’t the best use of time to discuss the uncertainty of the situation. But saying this is dishonesty doesn’t seem well supported
    - Ben Pace 25 Oct 2023 0:05 UTC
      14 points
      1
      Parent
      No, the question was about whether there are apocalyptic risks and on what timeline we should be concerned about apocalyptic risks.
      The questioner used the term ‘apocalyptic’ specifically. Three people answered the question, and the first two both also alluded to ‘apocalyptic’ risks and sort of said that they didn’t really think we need to think about that possibility. Them referring to apocalyptic risks goes to show that it was a key part of what the questioner wanted to understand — to what extent these risks are real and on what timeline we’ll need to react to them. My read is not that Matheny actively misled the speaker, but that he avoided answering, which is “hiding” rather than “lying” (I don’t agree with the OP that they’re identical).
      I think the question was unclear so it was more acceptable to not directly address whether there is apocalyptic risk, but I think many people I know would have definitely said “Oh to be clear I totally disagree with the previous two people, there are definitely apocalyptic risks and we are not prepared for them and cannot deal with them after-the-fact (as you just mentioned being concerned about).”
      ———
      More detail on what happened and my thoughts on it:
      Everyone who answered explicitly avoided making timeline predictions and instead talked about where they think the policy focus should be.
      The first person roughly said “We have many problems with AI right now, let’s focus on addressing those.”
      The middle person said the AI problems are all of the sort “people being sent to jail because of an errant ML system”.
      Here’s the middle person in full, clearly responding to the question of whether there’s apocalyptic risks to be worried about:
      People ask me what keeps me up at night. AGI does not keep me up at night. And the reason why it doesn’t, is because (as Ms Gibbons mentioned) the problems we are likely to face, with the apocalyptic visions of AGI, are the same problems we are already facing right now, with the systems that are already in play. I worry about people being sent to jail because of an errant ML system. Whether you use some fancy AGI to do the same thing, it’s the same problem… My bet is that the harms we’re going to see, as these more powerful systems come online — even with ChatGPT — are no different from the harms we’re seeing right now. So if we focus our efforts and our energies on governance and regulation and guardrails to address the harms we’re seeing right now, they will be able to adjust as the technology improves. I am not worried that what we put in place today will be out of date or out of sync with the new tech. The new tech is like the old tech, just supercharged.
      Matheny didn’t disagree with them and didn’t address the question of whether it’s apocalyptic, just said he was uncertain, and then listed the policies he wanted to see: setting standards with 3rd party audits, and governance of hardware supply chain to track it and control that it doesn’t go to places that aren’t democracies.
      To not state that you disagree with the last two positions signals that you agree with them, as the absence of your disagreement is evidence of the absence of disagreement. I don’t think Matheny outright said anything false but I think it is a bit misleading to not say “I totally disagree, I think the new tech will be akin to inventing a whole new superintelligent alien species that may kill us all and take over the universe” if something like that is what you believe.
      My read is that he was really trying as hard as he could to not address whether there are apocalyptic risks and instead just focus on encouraging the sorts of policies he thought should be implemented.
      - Eli Tyre 9 Nov 2023 18:22 UTC
        2 points
        2
        Parent
        My read is that he was really trying as hard as he could to not address whether there are apocalyptic risks and instead just focus on encouraging the sorts of policies he thought should be implemented.
        Why, though?
        
        Does he know something we don’t? Does he think that if he expresses that those risks are real he’ll lose political capital? People won’t put him or his friends in positions of power, because he’ll be branded as a kook?
        
        Is he just in the habit of side-stepping the weird possibilities?
        This looks to me, from the outside, like an unforced error. They were asking the question, about some core beliefs, pretty directly. It seems like it would help if, in every such instance, the EA people who think that the world might be destroyed by AGI in the next 20 years, say that they think that the world might be destroyed by AGI in the next 20 years.
    - habryka 25 Oct 2023 0:36 UTC
      3 points
      0
      Parent
      As Ben said, this seems incongruent with the responses that the other two people gave, neither of which talked that much about timelines, but did seem to directly respond to the concern about catastrophic/apocalyptic risk from AGI.
      I do agree that it’s plausible that Matheny somehow understood the question differently from the other two people, and interpreted it in a more timelines focused way, though he also heard the other two people talk, which makes that somewhat less likely. I do agree that the question wasn’t asked in the most cogent way.
      - Arthur Conmy 25 Oct 2023 1:03 UTC
        4 points
        0
        Parent
        Thanks for checking this! I mostly agree with all your original comment now (except the first part suggesting it was point blank, but we’re quibbling over definitions at this point), this does seem like a case of intentionally not discussing risk
    - simeon_c 25 Oct 2023 0:07 UTC
      1 point
      −3
      Parent
      A few other examples off the top of my head:
      
      ARC graph on RSPs with the “safe zone” part
      Anthropic calling ASL-4 accidental risks “speculative”
      the recent TIME article saying there’s no trade off between progress and safety
      
      More generally, for having talked to many AI policy/safety members, I can say it’s a very common pattern. At the eve of the FLI open letter, one of the most senior persons in the AI governance & policy X risk community was explaining that it was stupid to write this letter and that it would make future policy efforts much more difficult etc.
  - evhub 25 Oct 2023 21:54 UTC
    LW: 10 AF: 6
    8
    AF Parent
    I agree that it is important to be clear about the potential for catastrophic AI risk, and I am somewhat disappointed in the answer above (though I think calling “I don’t know” lying is a bit of a stretch). But on the whole, I think people have been pretty upfront about catastrophic risk, e.g. Dario has given an explicit P(doom) publicly, all the lab heads have signed the CAIS letter, etc.
    
    Notably, though, that’s not what the original post is primarily asking for: it’s asking for people to clearly state that they agree that we should pause/stop AI development, not to clearly state that that they think AI poses a catastrophic risk. I agree that people should clearly state that they think there’s a catastrophic risk, but I disagree that people should clearly state that they think we should pause.
    
    Primarily, that’s because I don’t actually think trying to get governments to enact some sort of a generic pause would make good policy. Analogizing to climate change, I think getting scientists to say publicly that they think climate change is a real risk helped the cause, but putting pressure on scientists to publicly say that environmentalism/degrowth/etc. would solve the problem has substantially hurt the cause (despite the fact that a magic button that halved consumption would probably solve climate change).
Joe_Collman 24 Oct 2023 14:22 UTC
LW: 44 AF: 10
23
AF
I agree with most of this, but I think the “Let me call this for what it is: lying for personal gain” section is silly and doesn’t help your case.
The only sense in which it’s clear that it’s “for personal gain” is that it’s lying to get what you want.
Sure, I’m with you that far—but if what someone wants is [a wonderful future for everyone], then that’s hardly what most people would describe as “for personal gain”.
By this logic, any instrumental action taken towards an altruistic goal would be “for personal gain”.
That’s just silly.
It’s unhelpful too, since it gives people a somewhat legitimate reason to dismiss the broader point.
Of course it’s possible that the longer-term altruistic goal is just a rationalization, and people are after power for its own sake, but I don’t buy that this is often true—at least not in any clean [they’re doing this and only this] sense. (one could have similar altruistic-goal-is-rationalization suspicions about your actions too)
In many cases, I think overconfidence is sufficient explanation.
And if we get into “Ah, but isn’t it interesting that this overconfidence leads to power gain”, then I’d agree—but then I claim that you should distinguish [conscious motivations] from [motivations we might infer by looking at the human as a whole, deep shadowy subconscious included]. If you’re pointing at the latter, please make that clear. (and we might also ask “What actions are not for personal gain, in this sense?”)
Again, entirely with you on the rest.
I’m not against accusations that may hurt feelings—but I think that more precision would be preferable here.
- Vaniver 24 Oct 2023 19:11 UTC
  LW: 17 AF: 7
  0
  AF Parent
  The only sense in which it’s clear that it’s “for personal gain” is that it’s lying to get what you want.
  Sure, I’m with you that far—but if what someone wants is [a wonderful future for everyone], then that’s hardly what most people would describe as “for personal gain”.
  If Alice lies in order to get influence, with the hope of later using that influence for altruistic ends, it seems fair to call the influence Alice gets ‘personal gain’. After all, it’s her sense of altruism that will be promoted, not a generic one.
  - Joe_Collman 24 Oct 2023 19:34 UTC
    LW: 19 AF: 10
    17
    AF Parent
    This is not what most people mean by “for personal gain”. (I’m not disputing that Alice gets personal gain)
    Insofar as the influence is required for altruistic ends, aiming for it doesn’t imply aiming for personal gain.
    Insofar as the influence is not required for altruistic ends, we have no basis to believe Alice was aiming for it.
    “You’re just doing that for personal gain!” is not generally taken to mean that you may be genuinely doing your best to create a better world for everyone, as you see it, in a way that many would broadly endorse.
    In this context, an appropriate standard is the post’s own:
    Does this “predictably lead people to believe false things”?
    Yes, it does. (if they believe it)
    “Lying for personal gain” is a predictably misleading description, unless much stronger claims are being made about motivation (and I don’t think there’s sufficient evidence to back those up).
    The “lying” part I can mostly go along with. (though based on a contextual ‘duty’ to speak out when it’s unusually important; and I think I’d still want to label the two situations differently: [not speaking out] and [explicitly lying] may both be undesirable, but they’re not the same thing)
    (I don’t really think in terms of duties, but it’s a reasonable shorthand here)
- Gabriel Alfour 24 Oct 2023 15:22 UTC
  5 points
  2
  Parent
  By this logic, any instrumental action taken towards an altruistic goal would be “for personal gain”.
  I think you are making a genuine mistake, and that I could have been clearer.
  There are instrumental actions that favour everyone (raising epistemic standards), and instrumental actions that favour you (making money).
  The latter are for personal gains, regardless of your end goals.
  Sorry for not getting deeper into it in this comment. This is quite a vast topic.
  I might instead write a longer post about the interactions of deontology & consequentialism, and egoism & altruism.
  - Joe_Collman 24 Oct 2023 15:34 UTC
    4 points
    0
    Parent
    (With “this logic” I meant to refer to [“for personal gain” = “getting what you want”]. But this isn’t important)
    If we’re sticking to instrumental actions that do favour you (among other things), then the post is still incorrect:
    [y is one consequence of x] does not imply [x is for y]
    The “for” says something about motivation.
    Is an action that happens to be to my benefit necessarily motivated by that? No.
    (though more often than I’d wish to admit, of course)
    If you want to claim that it’s bad to [Lie in such a way that you get something that benefits you], then make that claim (even though it’d be rather silly—just “lying is bad” is simpler and achieves the same thing).
    If you’re claiming that people doing this are necessarily lying in order to benefit themselves, then you are wrong. (or at least the only way you’d be right is by saying that essentially all actions are motivated by personal gain)
    If you’re claiming that people doing this are in fact lying in order to benefit themselves, then you should either provide some evidence, or lower your confidence in the claim.
    - Joe_Collman 24 Oct 2023 17:11 UTC
      4 points
      0
      Parent
      If it’s clearer with an example, suppose that the first action on the [most probable to save the world] path happens to get me a million dollars. Suppose that I take this action.
      Should we then say that I did it “for personal gain”?
      That I can only have done it “for personal gain”?
      This seems clearly foolish. That I happen to have gained from an instrumentally-useful-for-the-world action, does not imply that this motivated me. The same applies if I only think this path is the best for the world.
- simeon_c 24 Oct 2023 14:51 UTC
  3 points
  0
  Parent
  I think it still makes sense to have a heuristic of the form “I should have a particularly high bar of confidence If I do something deontologically bad that happens to be good for me personally”
  - Joe_Collman 24 Oct 2023 15:22 UTC
    2 points
    0
    Parent
    Agreed—though I wouldn’t want to trust that heuristic alone in this area, since in practice the condition won’t be [if I do something deontologically bad] but rather something like [if I notice that I’m doing something that I’m inclined to classify as deontologically bad].
evhub 25 Oct 2023 2:53 UTC
LW: 39 AF: 15
18
AF
I’m happy to state on the record that, if I had a magic button that I could press that would stop all AGI progress for 50 years, I would absolutely press that button. I don’t agree with the idea that it’s super important to trot everyone out and get them to say that publicly, but I’m happy to say it for myself.
- Eli Tyre 25 Oct 2023 23:00 UTC
  10 points
  3
  Parent
  I would like to observe to onlookers that you did in fact say something similar in your post on RSPs. Your very first sentence was:
  Recently, there’s been a lot of discussion and advocacy around AI pauses—which, to be clear, I think is great: pause advocacy pushes in the right direction and works to build a good base of public support for x-risk-relevant regulation.
- Nathaniel Monson 25 Oct 2023 7:34 UTC
  3 points
  0
  Parent
  If I had clear lines in my mind between AGI capabilities progress, AGI alignment progress, and narrow AI progress, I would be 100% with you on stopping AGI capabilities. As it is, though, I don’t know how to count things. Is “understanding why neural net training behaves as it does” good or bad? (SLT’s goal). Is “determining the necessary structures of intelligence for a given architecture” good or bad? (Some strands of mech interp). Is an LLM narrow or general?
  
  How do you tell, or at least approximate? (These are genuine questions, not rhetorical)
Richard_Ngo 24 Oct 2023 15:56 UTC
LW: 24 AF: 13
0
AF
How do you feel about “In an ideal world, we’d stop all AI progress”? Or “ideally, we’d stop all AI progress”?
- Ricardo Meneghin 24 Oct 2023 17:16 UTC
  −1 points
  −7
  Parent
  My interpretation of calling something “ideal” is that it presents that thing as unachievable from the start, and it wouldn’t be your fault if you failed to achieve that, whereas “in a sane world” clearly describes our current behavior as bad and possible to change.
Seth Herd 24 Oct 2023 20:08 UTC
20 points
11
We should shut it all down.

We can’t shut it all down.

The consequences of trying to shut it all down and failing, as we very likely would, could actually raise the odds of human extinction.

Therefore we don’t know what to publicly advocate for.

These are the beliefs I hear expressed by most serious AI safety people. They are consistent and honest.

For instance, see https://forum.effectivealtruism.org/posts/JYEAL8g7ArqGoTaX6/ai-pause-will-likely-backfire.

That post makes two good points:

A pause would: 2) Increasing the chance of a “fast takeoff” in which one or a handful of AIs rapidly and discontinuously become more capable, concentrating immense power in their hands. 3) Pushing capabilities research underground, and to countries with looser regulations and safety requirements.

Obviously these don’t apply to a permanent, complete shutdown. And they’re not entirely convincing even for a pause.

My point is that the issue is complicated.

A complete shutdown seems impossible to maintain for all of humanity. Someone is going to build AGI. The question is who and how.

The call for more honesty is appreciated. We should be honest, and include “obviously we should just not do it”. But you don’t get many words when speaking publicly, so making those your primary point is a questionable strategy.
- William the Kiwi 26 Oct 2023 16:13 UTC
  1 point
  0
  Parent
  We can’t shut it all down.
  Why do you personally think this is correct? Is it that humanity is unknowing of how to shut it down? Or uncapable? Or unwilling?
  - Seth Herd 26 Oct 2023 19:03 UTC
    3 points
    1
    Parent
    This is a good question. It’s worth examining the assumption if it’s the basis of our whole plan.
    
    When I say “we”, I mean “the currently listening audience”, roughly the AI safety community. We don’t have the power to convince humanity to shut down AI research.
    
    There are a few reasons I think this. The primary one is that humanity isn’t a single individual. People have different perspectives. Some will not not be likely to change their minds. There are even some individuals for whom building an AGI would actually be a good idea. Those are people who care more about personal gain than they do about the safety or future of humanity. Sociopaths of one sort or another are thought to make up perhaps 10% of the population (the 1% diagnosed are the ones who get caught). For a sociopath, it’s a good bet to risk the future of humanity against a chance of becoming the most powerful person alive. There are thought to be a lot of sociopaths in government, even in democratic countries.
    
    So, sooner or later, you’re going to see a government or rich individual working on AGI with the vastly improved compute and algorithmic resources that continued advances in hardware and software will bring. The only way to enforce a permanent ban would be to ban computers, or have a global panopticon that monitors what every computer is doing. That might well lead to a repressive global regime that stays in power permanently. That is an S-risk; a scenario in which humanity suffers forever. That’s arguably worse than dying in an attempt to achieve AGI.
    
    Those are weak and loose arguments, but I think that describes the core of my and probably many others’ thinking on the topic.
- rotatingpaguro 24 Oct 2023 21:16 UTC
  1 point
  0
  Parent
  I am under the impression that, when counting words in public for strategic political reasons, it’s better to be a crazy mogul that shouts extreme takes with confidence, to make your positions clear, even if people already know they can’t take your word to the letter. But I’m not sure I know who’s the strategic target here.
Gesild Muka 24 Oct 2023 13:42 UTC
11 points
10
hiding your beliefs, in ways that predictably leads people to believe false things, is lying. This is the case regardless of your intentions, and regardless of how it feels.
I think people generally lie WAY more than we realize and most lies are lies of omission. I don’t think deception is usually the immediate motivation but due to a kind of social convenience. Maintaining social equilibrium is valued over openness or honesty regarding relevant beliefs that may come up in everyday life.
- William the Kiwi 26 Oct 2023 16:17 UTC
  2 points
  0
  Parent
  I would agree that people lie way more than they realise. Many of these lies are self-deception.
Erich_Grunewald 24 Oct 2023 13:59 UTC
9 points
1
ARC & Open Philanthropy state in a press release “In a sane world, all AGI progress should stop. If we don’t, there’s more than a 10% chance we will all die.”
Could you spell out what you mean by “in a sane world”? I suspect a bunch of people you disagree with do not favor a pause due to various empirical facts about the world (e.g., there being competitors like Meta).
DanielFilan 25 Oct 2023 7:24 UTC
LW: 7 AF: 4
7
AF

hiding your beliefs, in ways that predictably leads people to believe false things, is lying

I think this has got to be tempered by Grice to be accurate. Like, if I don’t bring up some unusual fact about my life in a brief conversation (e.g. that I consume iron supplements once a week), this predictably leads people to believe something false about my life (that I do not consume iron supplements once a week), but is not reasonably understood as the bad type of lie—otherwise to be an honest person I’d have to tell everyone tons of minutiae about myself all the time that they don’t care about.

Is this relevant to the point of the post? Maybe a bit—if I (that is, literally me) don’t tell the world that I wish people would stop advancing the frontier of AI, I don’t think that’s terribly deceitful or ruining coordination. What has to be true for me to have a duty to say that? Maybe for me to be a big AI thinkfluencer or something? I’m not sure, and the post doesn’t really make it clear.
Dagon 24 Oct 2023 17:04 UTC
7 points
−5
Upvoted, and thanks for writing this. I disagree on multiple dimensions—on the object level, I don’t think ANY research topic can be stopped for very long, and I don’t think AI specifically gets much safer with any achievable finite pause, compared to a slowdown and standard of care for roughly the same duration. On the strategy level, I wonder what other topics you’d use as support for your thesis (if you feel extreme measures are correct, advocate for them). US Gun Control? Drug legalization or enforcement? Private capital ownership?
On the “be honest and direct” side, hiding your true beliefs does lead to correctness of slippery-slope fears. “If we allow compromise X, next they’ll push for X+1″ is actually the truth on such topics. It’s not clear that it MATTERS if your opponents/disbelievers fear the slippery slope or if they know for certain that you want the endpoint.
On the “push for an achievable compromise position” side, a few major benefits. First, it may actually work—you may get some improvement. Second, it leads to discussion and exploration of that point on the continuum, and will shift the overton window a bit. Third, it keeps you enough in the mainstream that you can work toward your REAL goals (AI safety, not AI pause) with all the tools available, rather than being on the fringe and nobody listening to anything you say.
Lao Mein 25 Oct 2023 1:07 UTC
5 points
1
Counterpoint: we are better off using what political/social capital we have to advocate for more public funding in AI alignment. I think of slowing down AI capabilities research as just a means of buying time to get more AI alignment funding—but essentially useless unless combined with a strong effort to get money into AI alignment.
Max H 24 Oct 2023 14:51 UTC
5 points
−1
Hmm, I’m in favor of an immediate stop (and of people being more honest about their beliefs) but in my experience the lying / hiding frame doesn’t actually describe many people.
This is maybe even harsher than what you said in some ways, but to me it feels more like even very bright alignment researchers are often confused and getting caught in shell games with alignment, postulating that we’ll be able to build “human level” AI, which somehow just doesn’t do a bunch of bad things that smart humans are clearly capable of. And if even the most technical people are confused when talking to each other, I wouldn’t expect leadership of big labs to do better when talking to the public, even if they’re being scrupulously honest about their own beliefs.
My biggest issue with e.g. RSPs are pauses done right is actually the proposed unpause condition:
1. Once labs start to reach models that pose a potential takeover risk, they either:
  Solve mechanistic interpretability to a sufficient extent that they are able to pass an understanding-based eval and demonstrate that their models are safe.
  Get blocked on scaling until mechanistic interpretability is solved, forcing a reroute of resources from scaling to interpretability.
There would probably be even more genuine confusion / disagreement on the topic, but I think talking openly about much stricter unpause conditions would be good. I think “solve mechanistic interpretability” and “pass (both kinds of) evals” is not really close to sufficient.
My criteria for an unpause would look more like (just an example to give flavor, not meant to be realistic / well thought-out):
- There’s broad consensus and understanding about which object-level things (shutdown problem, embedded agency, etc.) are actually relevant to alignment, and what the shape of solutions even looks like.
- Research on topics like “Human wanting” are filled with precise math and diagrams and gears-level maps to neuroscience. (But the math and diagrams and concepts themselves should be completely independent of the neuroscience; links to human neuroscience are checksums for validity, not crutches for understanding.)
- We have complete solutions to things like the shutdown problem. (I don’t think “build in a shutdown button via giving the AI invulnerable incomplete preferences” should be part of anyone’s actual alignment plan, but it should be the kind of thing we know how to do before scaling.)
I think if everyone already saw why mechanistic interpretability is insufficient, they could also see why it would be better to just institute a pause now. But they don’t, so they continue to push for scaling in a genuine pursuit of deconfusion. Not great! But also not exactly dishonest or even inaccurate; no one has actually figured out a better way to reliably deconfuse people so far, and scaling further does seem likely to actually work for that, one way or another.
Gurkenglas 24 Oct 2023 13:35 UTC
5 points
0
People who think that it’s deontologically fine to remain silent might not come out and say it.
Shankar Sivarajan 26 Oct 2023 14:04 UTC
4 points
2
Consider what happens when a community rewards the people who gain more influence by lying!
This is widely considered a better form of government than hereditary aristocracies.
MinusGix 25 Oct 2023 13:22 UTC
3 points
1
I agree with others to a large degree about the framing/tone/specific-words not being great, though I agree with a lot the post itself, but really that’s what this whole post is about: that dressing up your words and saying partial in-the-middle positions can harm the environment of discussion. That saying what you truly believe then lets you argue down from that, rather than doing the arguing down against yourself—and implicitly against all the other people who hold a similar ideal belief as you. I’ve noticed similar facets of what the post gestures at, where people pre-select the weaker solutions to the problem as their proposals because they believe that the full version would not be accepted. This is often even true, I do think that completely pausing AI would be hard. But I also think it is counterproductive to start at the weaker more-likely-to-be-satisfiable position, as that gives room to be pushed further down. It also means that the overall presence is on that weaker position, rather than the stronger ideal one, which can make it harder to step towards the ideal.

We could quibble about whether to call it lying, I think the term should be split up into a bunch of different words, but it is obviously downplaying. Potentially for good reason, but I agree with the post that I think people too often ignore the harms of doing preemptive downplaying of risks. Part of this is me being more skeptical about the weaker proposals than others, obviously if you think RSPs have good chances for decreasing X-risk and/or will serve as a great jumping-off point for better legislation, then the amount of downplaying to settle on them is less of a problem.
trevor 24 Oct 2023 20:35 UTC
2 points
2
let us be clear: hiding your beliefs, in ways that predictably leads people to believe false things, is lying. This is the case regardless of your intentions, and regardless of how it feels.
Not only is it morally wrong, it makes for a terrible strategy. As it stands, the AI Safety Community itself can not coordinate to state that we should stop AGI progress right now!
Some dynamics and gears in world models are protected secrets, when they should be open-sourced and researched by more people, and other gears are open-sourced and researched by too many people, when they should be protected secrets. Some things are protected secrets and should be, some things are open-source research and should be.
Each individual thing is determined (and disputed) on a case-by-case basis. For example, I think that the contemporary use of AI for human thought steering should be a gear in more people’s world models, and other people don’t, but we have reasons specific to that topic. There’s no all-encompassing policy here; staying silent can cause massive amounts of harm, but the counterfactual (telling everyone) can sometimes cause much more harm.
jeffreycaruso 4 Mar 2024 4:46 UTC
1 point
0
I don’t see the practical value of a post that starts off with conjecture rather than reality; i.e., “In a saner world....”
You clearly wish that things were different, that investors and corporate executives would simply stop all progress until ironclad safety mechanisms were in place, but wishing doesn’t make it so.
Isn’t the more pressing problem what can be done in the world that we have, rather than in a world that we wish we had?
rotatingpaguro 24 Oct 2023 21:33 UTC
1 point
0
Too many claimed to pursue the following approach:
1. It would be great if AGI progress stopped, but that is infeasible.
2. Therefore, I will advocate for what I think is feasible, even if it is not ideal.
3. The Overton window being what it is, if I claim a belief that is too extreme, or endorse an infeasible policy proposal, people will take me less seriously on the feasible stuff.
4. Given this, I will be tactical in what I say, even though I will avoid stating outright lies.
I think if applied strictly to people identified by this list, the post is reasonable. I have the impression some criticism tries to consider a more general claim. However I lack social skills so I may be missing tons of subtext that other people think they can reliably read.
Michael Roe 24 Oct 2023 22:15 UTC
−1 points
0
I think politics often involves bidding for the compromise you think is feasible, rather than what you’ld ideally want.
whats maybe different in the AI risk case, and others like it, is how you’ll be regarded when things go wrong.
hypothetical scenario
1. An AI does something destructive, on the order of 9/11
2. Government over-reacts as usual, and cracks down on the AI companies like the US did on Osama bin Laden, or Israel did on Hamas.
3. you are like, yeah we knew that was going to happen
4. governmet to you, what the fuck? Why didn’t you tell us?

Lying is Cowardice, not Strategy

1. The AI Safety Community is making our job harder

2. Lying for Personal Gain

3. The Spirit of Coordination

Omission

Belief-suppression