Views my own, not my employers.
cdt
I just expect that for cooperation to work at large scales and over the long term, you need to do a bunch of exclusion/separation at smaller scales.
Why not just talk about this instead?
Under many circumstances- when the target is very well loved and the problem lies in a subtle pattern- the only criticism you should expect will come from people who seem kind of crazy or out to get the target.
Consider the Javert Paradox.
Thanks for doing this work, this is a really important paper in my view.
One question that sprang to mind when reading this: to what extent do the disempowerment primitives and amplifying factors correlate with each other? i.e. Are conversations which contain one primitive likely to contain others? Ditto with the amplifying factors?
The impression I took away is that these elements were being treated independently which strikes me as a reasonable modelling assumption but the estimates of rate strike me as quite sensitive to that. Would be happy to be proven wrong.
You may mean phylogenetic inertia.
I think this article would’ve been far better without talking about Greenpeace. The engagement with Greenpeace is brief and low-context but most of the argument relies on taking your position on Greenpeace as fact.
The new short form content seems clearly way worse. Imagine children switching from watching old television shows to YouTube Kids or Shorts on that same TV.
I agree. In comparison to old-form television shows, I wonder how small the teams that produce shortform content are, and consequently how few people are able to moderate and judge its appropriateness.
I experience cognitive dissonance, because my model of Eliezer is someone who is intelligent, rational, and aiming at using at least their public communications to increase the chance that AI goes well.
Consider that he is just as human and fallible as everyone else. “None of Eliezer’s public communication is -EV for AI safety” is such an incredibly high bar it is almost certainly not true. We all say things that are poor.
Really enjoyed this!!
Quick question: What does the “% similarity” bar mean? It’s not obviously functional (GO-based) nor is it obviously structural. Several rounds of practice have been waylaid by me misinterpreting what it means for a protein to be 95% similar to the target...
I’m pleased to see this, and giving me credit or blame for it is far too generous. It seems many other people have also enjoyed reading it.
I do feel this “reflexive ick reactions to the ideas” and it is interesting how orthogonal they are to the typical concerns around horizon scanning or post-AGI thought (e.g. coup risk).
Please put this in a top-level post. I don’t agree (or rather I don’t feel it’s this simple), but I really enjoyed reading your two rejoinders here.
I particularly dislike that this topic has stretched into psychoanalysis (of Anthropic staff, of Mikhail Samin, of Richard Ngo) when I felt that the best part of this article was its groundedness in fact and nonreliance on speculation. Psychoanalysis of this nature is of dubious use and pretty unfriendly.
Any decision to work with people you don’t know personally that relies on guessing their inner psychology is doomed to fail.
The post contains one explicit call-to-action:
If you are considering joining Anthropic in a non-safety role, I ask you to, besides the general questions, carefully consider the evidence and ask yourself in which direction it is pointing, and whether Anthropic and its leadership, in their current form, are what they present themselves as and are worthy of your trust.
If you work at Anthropic, I ask you to try to better understand the decision-making of the company and to seriously consider stopping work on advancing general AI capabilities or pressuring the company for stronger governance.
This targets a very small proportion of people who read this article. Is there another way we could operationalize this work, one that targets people who aren’t working/aiming to work at Anthropic?
Maybe, but it suffers from both ends of the legitimacy problem. At one extreme, some people will never accept a judgement from an LLM as legitimate. At the other extreme, people will perceive LLMs as being “more than impartial” when, in truth, they are a different kind of arbitrary.
This comment was really useful. Have you expanded on this in a post at all?
This is not good. Why should people run the risk of interacting with the AI safety community if this is true?
There’s a pressure to have a response or to continue the conversation in many cases. Particularly for moral issues, it is hard to say “I don’t know enough / I’ll have to think about it”, since that also pushes against this “I’m supposed to have a deep independent strong moral commitment” concept. We expect moral issues to have a level of intuitive clarity.
For those that rely on intelligence enhancement as a component of their AI safety strategy, it would be a good time to get your press lines straight. The association of AI safety with eugenics (whether you personally agree with that label or not) strikes me as a soft target and a simple way to keep AI safety as a marginal movement.
It’s interesting to read this in the context of the discussion of polarisation. Was this the first polarisation?
Sorry, this was perhaps unfair of me to pick on you for making the same sort of freehand argument that many have done, maybe I should write a top-level post about it.
To clarify—the idea that “climate change is not being solved because of polarisation” and “AI safety would suffer from being like climate action [due to the previous]” are twin claims that are not obvious. These arguments seem surface-level reasonable by hinging on a lot of internal American politics that I don’t think engages with the breadth of drivers of climate action. To some extent these arguments betray the lip service that AI safety is an international movement because they seek to explain the solution of an international problem solely within the framework of US politics. I also feel the polarisation of climate change is itself sensationalised.
But I think what you’ve said here is more interesting:
One might suppose that creating polarization leads to false balance arguments, because then there are two sides so to be fair we should balance both of them. If there are just a range of opinions, false blance is less easy to argue for.It seems like you believe that the opposite of polarisation is plurality (all arguments seen as equally valid), whereas I would see the opposite of polarisation as consensus (one argument is seen as valid). This is in contrast to polarisation (different groups see different arguments as valid). Valid here being more like “respectable” rather than “100% accurate”. But indeed, it’s not obvious to me that the chain of causality is polarisation → desire for false balance, rather than desire for false balance → polarisation. (Also handwavey notion to the idea that this desire for false balance comes from conflicting goals a la conflict theory).
I feel this article was insufficiently integrative across the fields of evolution, ecology, and conservation science. In the first, it largely ignores the research frontier of speciation with gene flow and the speciation continuum. You also note that the phylogenetic (cladistic) species concept is necessary but not sufficent, and yet also make no mention of phylogenetic discordance and/or hemiplasy in macroevolutionary time. Obviously you can’t mention everything, but these are massive holes in your conclusion, ones that contemporary speciation research naturally brings up.
In the second, you say that your speciation concept would improve ecology. Why? Ecologists who can see trait variation that they are interested in are not going to ignore it on the basis of speciation, which trait variation tracks poorly. The fact that still much evidence for hybridisation between pairs of taxa is based on natural history observations makes this situation worse, since it is genuinely hard to circumscribe very plastic species vs a hybrid zone. I suspect for this reason the potential for hybridisation is vastly underestimated. Measures of biodiversity are trying to move away from species richness due to this, something that would not be solved by a more consistent definition that still has the same fundamental issues.
In the third, conservationists are already integrating along these lines. IUCN includes subspecies and populations of distinct conservation relevance. It is not clear to me how the population viability concept connects to the species concept debate, except insofar as it gives us a common language to compare populations that already has consensus. Conservationists and taxonomists definitely understand the definitions—it’s just a bit intractable and tangential to the conservation value.
Having said that, I appreciate you are writing this under your own name and I am writing this under a pseudonym. Compliments to writing good quality conservation content, even if I disagree with it.