Views my own, not my employers.
cdt
This is not good. Why should people run the risk of interacting with the AI safety community if this is true?
There’s a pressure to have a response or to continue the conversation in many cases. Particularly for moral issues, it is hard to say “I don’t know enough / I’ll have to think about it”, since that also pushes against this “I’m supposed to have a deep independent strong moral commitment” concept. We expect moral issues to have a level of intuitive clarity.
For those that rely on intelligence enhancement as a component of their AI safety strategy, it would be a good time to get your press lines straight. The association of AI safety with eugenics (whether you personally agree with that label or not) strikes me as a soft target and a simple way to keep AI safety as a marginal movement.
It’s interesting to read this in the context of the discussion of polarisation. Was this the first polarisation?
Sorry, this was perhaps unfair of me to pick on you for making the same sort of freehand argument that many have done, maybe I should write a top-level post about it.
To clarify—the idea that “climate change is not being solved because of polarisation” and “AI safety would suffer from being like climate action [due to the previous]” are twin claims that are not obvious. These arguments seem surface-level reasonable by hinging on a lot of internal American politics that I don’t think engages with the breadth of drivers of climate action. To some extent these arguments betray the lip service that AI safety is an international movement because they seek to explain the solution of an international problem solely within the framework of US politics. I also feel the polarisation of climate change is itself sensationalised.
But I think what you’ve said here is more interesting:
One might suppose that creating polarization leads to false balance arguments, because then there are two sides so to be fair we should balance both of them. If there are just a range of opinions, false blance is less easy to argue for.It seems like you believe that the opposite of polarisation is plurality (all arguments seen as equally valid), whereas I would see the opposite of polarisation as consensus (one argument is seen as valid). This is in contrast to polarisation (different groups see different arguments as valid). Valid here being more like “respectable” rather than “100% accurate”. But indeed, it’s not obvious to me that the chain of causality is polarisation → desire for false balance, rather than desire for false balance → polarisation. (Also handwavey notion to the idea that this desire for false balance comes from conflicting goals a la conflict theory).
Like Daniel wrote in a comment, it’s good to think of Agent-5 as distributed and able to nudge things all over the internet
There’s a real missing model or mechanism by which any of the AIs go from “able to nudge large categories of people” to “able to persuade individual actors to do distinct things”. There’s quite a bit of research in the former category but I don’t see there’s any evidence for the latter (Maybe truesight + blackmail would count?)
climate change issue worked out with polarization involved
The climate change issue has pretty widespread international agreement and in most countries is considered a bipartisan issue. The capture of climate change by polarising forces has not really affected intervention outcomes (other problems of implementation are, imo, far greater).
I don’t want to derail the AI safety organising conversation, but I see this climate change comparison come up a lot. It strikes me as a pretty low-quality argument and it’s not clear a) whether the central claim is even true and b) whether it is transferable to organising in AI safety.
The flipside of the polarisation issue is the “false balance” issue, and that reference to smoking by TsviBT seems to be what this discussion is pointing at.
robust to strong optimization
Has optimisation “strength” been conceptualised anywhere? What does it mean for an agent to be a strong versus weak optimiser? Does it go beyond satisficing?
I don’t expect there to be any incidents where an AI system schemes against the company that created it, takes over its datacenter and/or persuades company leadership to trust it, and then gets up to further shenanigans before getting caught and shutdown.
Have you written more about your thought process on this somewhere else?
in the longer term I expect to try to do political advocacy as well
I find this idea particularly fraught. I already find it somewhat difficult to engage on this site due to the contentious theories some members hold, and I echo testingthewater’s warning against the trap of reopening these old controversies. You’re trying to thread a really fine needle between “meaningfully advocate change” and “open all possible debates” that I don’t think is feasible.
The site is currently watching a major push from Yudkowsky and Soares’ book launch towards a broad coalition for an AI pause. It really only takes a couple major incidents of connecting the idea to ethnonationalism, scientific racism and/or dictatorship for the targets of your advocacy to turn away.
I’m not going to suggest you stay on-message (lw is way too “truth-seeking” for that to reach anyone), but you should carefully consider the ways in which your future goals conflict.
There is a lot of work on this under the title “background extinction rate”—see https://www.nature.com/articles/nature09678 for a review. Estimates for the current extinction rate (measured in extinctions/million species years) can be anywhere from 10-1000x faster than the background extinction rate, but it depends a lot on the technique used and the time interval measured. EDIT: typo in numbers
Based on how much habitat we’ve destroyed, and assuming some number of species per unit-area
Can you elaborate on this? What about the estimates did you find implausible?
Species-area relationships are pretty reliable when used for estimating other factors. Using them for extinction estimation is upward-biased. https://doi.org/10.1038/nature09985 suggests the bias of overestimation is a similar magnitude as the underestimation caused by dark extinctions (extinctions of species before they are classified).
The issue is greater when people do not have the relevant expertise to judge whether the LLM output is correct or useful. People who find things on websites generally have to evaluate the content, but several times people have responded to my questions with LLM output that plainly does not answer the question or show insight into the problem.
It’s not obvious to me that an illegal act of this nature would complete successfully. Why wouldn’t a company that did such a thing not be court-ordered to reassemble? How much time do you think humanity would gain from this?
I don’t believe anyone was forecasting this result, no.
EDIT: Clarifying—many forecasts made no distinction whether an AI model had a major formal method component like AlphaProof or not. I’m drawing attention to the fact that the two situations are distinct and require distinct updates. What those are, I’m not sure yet.
Would this be legal for any currently-existing lab to do? I doubt it, but I am not a lawyer.
I think it was reasonable to expect GDM to achieve gold with an AlphaProof-like system. Achieving gold with a general LLM-reasoning system from GDM would be something else and it is important for discussion around this to not confuse one forecast for another. (Not saying you are, but that in general it is hard to tell which claim people are putting forward.)
Do LLMs themselves internalise a definition of AI safety like this? A quick check of Claude 4 Sonnet suggests no (but it’s the most x-risk paradigmed company so...)
My candidate is “asymmetrist”. Egalitarianism tries to enforce a type of symmetry across the entirety of society. But our job will increasingly be to design societies where the absence of such symmetries is a feature not a bug.
This is a revival of class-collaborationist corporatism with society stratified by cognition. When cognitive ability is enabled by access to wealth (or historic injustice), this corporatism takes on an authoritarian character. I think the left-wing is more than capable of engaging with these issues—it simply rejects them on a normative basis.
This comment was really useful. Have you expanded on this in a post at all?