Views my own, not my employers.
cdt
I experience cognitive dissonance, because my model of Eliezer is someone who is intelligent, rational, and aiming at using at least their public communications to increase the chance that AI goes well.
Consider that he is just as human and fallible as everyone else. “None of Eliezer’s public communication is -EV for AI safety” is such an incredibly high bar it is almost certainly not true. We all say things that are poor.
Really enjoyed this!!
Quick question: What does the “% similarity” bar mean? It’s not obviously functional (GO-based) nor is it obviously structural. Several rounds of practice have been waylaid by me misinterpreting what it means for a protein to be 95% similar to the target...
I’m pleased to see this, and giving me credit or blame for it is far too generous. It seems many other people have also enjoyed reading it.
I do feel this “reflexive ick reactions to the ideas” and it is interesting how orthogonal they are to the typical concerns around horizon scanning or post-AGI thought (e.g. coup risk).
Please put this in a top-level post. I don’t agree (or rather I don’t feel it’s this simple), but I really enjoyed reading your two rejoinders here.
I particularly dislike that this topic has stretched into psychoanalysis (of Anthropic staff, of Mikhail Samin, of Richard Ngo) when I felt that the best part of this article was its groundedness in fact and nonreliance on speculation. Psychoanalysis of this nature is of dubious use and pretty unfriendly.
Any decision to work with people you don’t know personally that relies on guessing their inner psychology is doomed to fail.
The post contains one explicit call-to-action:
If you are considering joining Anthropic in a non-safety role, I ask you to, besides the general questions, carefully consider the evidence and ask yourself in which direction it is pointing, and whether Anthropic and its leadership, in their current form, are what they present themselves as and are worthy of your trust.
If you work at Anthropic, I ask you to try to better understand the decision-making of the company and to seriously consider stopping work on advancing general AI capabilities or pressuring the company for stronger governance.
This targets a very small proportion of people who read this article. Is there another way we could operationalize this work, one that targets people who aren’t working/aiming to work at Anthropic?
Maybe, but it suffers from both ends of the legitimacy problem. At one extreme, some people will never accept a judgement from an LLM as legitimate. At the other extreme, people will perceive LLMs as being “more than impartial” when, in truth, they are a different kind of arbitrary.
This comment was really useful. Have you expanded on this in a post at all?
This is not good. Why should people run the risk of interacting with the AI safety community if this is true?
There’s a pressure to have a response or to continue the conversation in many cases. Particularly for moral issues, it is hard to say “I don’t know enough / I’ll have to think about it”, since that also pushes against this “I’m supposed to have a deep independent strong moral commitment” concept. We expect moral issues to have a level of intuitive clarity.
For those that rely on intelligence enhancement as a component of their AI safety strategy, it would be a good time to get your press lines straight. The association of AI safety with eugenics (whether you personally agree with that label or not) strikes me as a soft target and a simple way to keep AI safety as a marginal movement.
It’s interesting to read this in the context of the discussion of polarisation. Was this the first polarisation?
Sorry, this was perhaps unfair of me to pick on you for making the same sort of freehand argument that many have done, maybe I should write a top-level post about it.
To clarify—the idea that “climate change is not being solved because of polarisation” and “AI safety would suffer from being like climate action [due to the previous]” are twin claims that are not obvious. These arguments seem surface-level reasonable by hinging on a lot of internal American politics that I don’t think engages with the breadth of drivers of climate action. To some extent these arguments betray the lip service that AI safety is an international movement because they seek to explain the solution of an international problem solely within the framework of US politics. I also feel the polarisation of climate change is itself sensationalised.
But I think what you’ve said here is more interesting:
One might suppose that creating polarization leads to false balance arguments, because then there are two sides so to be fair we should balance both of them. If there are just a range of opinions, false blance is less easy to argue for.It seems like you believe that the opposite of polarisation is plurality (all arguments seen as equally valid), whereas I would see the opposite of polarisation as consensus (one argument is seen as valid). This is in contrast to polarisation (different groups see different arguments as valid). Valid here being more like “respectable” rather than “100% accurate”. But indeed, it’s not obvious to me that the chain of causality is polarisation → desire for false balance, rather than desire for false balance → polarisation. (Also handwavey notion to the idea that this desire for false balance comes from conflicting goals a la conflict theory).
Like Daniel wrote in a comment, it’s good to think of Agent-5 as distributed and able to nudge things all over the internet
There’s a real missing model or mechanism by which any of the AIs go from “able to nudge large categories of people” to “able to persuade individual actors to do distinct things”. There’s quite a bit of research in the former category but I don’t see there’s any evidence for the latter (Maybe truesight + blackmail would count?)
climate change issue worked out with polarization involved
The climate change issue has pretty widespread international agreement and in most countries is considered a bipartisan issue. The capture of climate change by polarising forces has not really affected intervention outcomes (other problems of implementation are, imo, far greater).
I don’t want to derail the AI safety organising conversation, but I see this climate change comparison come up a lot. It strikes me as a pretty low-quality argument and it’s not clear a) whether the central claim is even true and b) whether it is transferable to organising in AI safety.
The flipside of the polarisation issue is the “false balance” issue, and that reference to smoking by TsviBT seems to be what this discussion is pointing at.
robust to strong optimization
Has optimisation “strength” been conceptualised anywhere? What does it mean for an agent to be a strong versus weak optimiser? Does it go beyond satisficing?
I don’t expect there to be any incidents where an AI system schemes against the company that created it, takes over its datacenter and/or persuades company leadership to trust it, and then gets up to further shenanigans before getting caught and shutdown.
Have you written more about your thought process on this somewhere else?
in the longer term I expect to try to do political advocacy as well
I find this idea particularly fraught. I already find it somewhat difficult to engage on this site due to the contentious theories some members hold, and I echo testingthewater’s warning against the trap of reopening these old controversies. You’re trying to thread a really fine needle between “meaningfully advocate change” and “open all possible debates” that I don’t think is feasible.
The site is currently watching a major push from Yudkowsky and Soares’ book launch towards a broad coalition for an AI pause. It really only takes a couple major incidents of connecting the idea to ethnonationalism, scientific racism and/or dictatorship for the targets of your advocacy to turn away.
I’m not going to suggest you stay on-message (lw is way too “truth-seeking” for that to reach anyone), but you should carefully consider the ways in which your future goals conflict.
There is a lot of work on this under the title “background extinction rate”—see https://www.nature.com/articles/nature09678 for a review. Estimates for the current extinction rate (measured in extinctions/million species years) can be anywhere from 10-1000x faster than the background extinction rate, but it depends a lot on the technique used and the time interval measured. EDIT: typo in numbers
I agree. In comparison to old-form television shows, I wonder how small the teams that produce shortform content are, and consequently how few people are able to moderate and judge its appropriateness.