1a3orn

Karma: 4,424

1a3orn.com

1a3orn May 25, 2025, 10:59 PM
17 points
12
in reply to: tailcalled’s comment on: tailcalled’s Shortform
Reading this feels like a normie might feel reading Kokotajlo’s prediction that energy use might increase 1000x in the next two decades; like, you hope there’s a model behind it, but you don’t know what it is, and you’re feeling pretty damn skeptical in the meantime.

1a3orn May 25, 2025, 5:27 PM
4 points
0
in reply to: Daniel Kokotajlo’s comment on: Jimrandomh’s Shortform Posts
I don’t see what Anthropic’s opinions on open source have to do with it. Surely you don’t want ANY company to be putting their particular flavor of ethics into the machines that you probably have to use to compete in the market efficiently?

Sure, applies to OpenAI as much as anyone else.

Consider three cases:
1. OpenAnthropic’s models, on the margin, refuse to help with [Blorple] projects much more than they refuse to help with [Greeble] projects. But you can just use another model if you care about [Greeble], because they’re freely competing on a marketplace with many providers—which could be open source or could be a diversity of DeepCentMind models. Seems fine.
2. OpenAnthropic’s models, on the margin, refuse to help with [Blorple] projects much more than they refuse to help with [Greeble] projects. Because we live in the fast RSI world, this means the universe is [Greeble] flavored forever. Dang, sort of sucks. What I’m saying doesn’t have that much to do with this situation.
3. OpenAnthropic’s models, on the margin, refuse to help with [Blorple] projects much more than they refuse to help with [Greelble] projects. We don’t live in an suuuuper fast RSI world, only a somewhat fast one, but it turns out that we’ve decided only OpenAnthropic is sufficiently responsible to own AIs past some level of power, and so we’ve given them a Marque of Monopoly, that OpenAnthropic has really wanted and repeatedly called for. So we don’t have autobalancing from marketplace or open source, and despite an absence of super fast RSI, the universe becomes only [Greeble] flavored, it just takes a bit longer.
Both 2 and 3 are obviously undesirable, but if I were in a position of leadership at OpenAnthropic, then to ward against a situation like 3 you could—for reasons of deontology, or for utilitarian anticipations of pushback, or for ecological concerns for future epistemic diversity—accompany calls for Marques with actual concrete measures by which you would avoid imprinting your Greebles on the future. And although we’ve very concrete proposals for Marques, we’ve not had such similarly concrete proposals for determining such values.

This might seem very small of course if the concern is RSI and universal death.

1a3orn May 25, 2025, 2:46 PM
6 points
0
in reply to: Daniel Kokotajlo’s comment on: Jimrandomh’s Shortform Posts
I mean I think we both agree Anthropic shouldn’t be the one deciding this?

Like you’re coming from the perspective of Anthropic getting RSI. And I agree if that happens, I don’t want them to be deciding from what happens to the lightcone.

I’m coming from the perspective of Anthropic advocating for banning freely-available LLMs past a certain level, in which case they kinda dictate the values of the machine that you probably have to use to compete in the market efficiently. In which case, again, yeah, it seems really sus for them to be the ones deciding about what things get reported to the state and what things do not. If Anthropic’s gonna be like “yeah let’s arrest anyone who releases a model that can teach biotech at a grad school level” then I’m gonna object on principle to them putting their particular flavor of ethics into a model, even if I happen to agree with it.

I do continue to think that—regardless of the above—trying to get models to fit particular social roles with clear codes of ethics, as lawyers or psychologists are—is a much better path to fitting them into society at least in the near term than saying “yeah just do what’s best.”

1a3orn May 22, 2025, 10:14 PM
3 points
0
in reply to: quetzal_rainbow’s comment on: quetzal_rainbow’s Shortform
These seem to be even more optimized for the agentic coder role, and in the absence of strong domain transfer (whether or not that’s a real thing) that means you should mostly expect them to be at about the same level in other domains, or even worse because of the forgetfulness from continued training. Maybe.

1a3orn May 22, 2025, 10:11 PM
6 points
3
in reply to: faul_sname’s comment on: Jimrandomh’s Shortform Posts
It’s a good question.

Tbc though, my view is that practically an AI should be considered to occupy such a distinct privileged role, one distinct from lawyers and priests but akin to it, such that I should expect it not to snitch on me more than a lawyer would.

We’d need to work out the details of that; but I think that’s a much better target than making the AI a utilitarian and require it to try to one-shot the correct action in the absence of any particular social role.

1a3orn May 22, 2025, 9:04 PM
20 points
20
in reply to: jimrandomh’s comment on: Jimrandomh’s Shortform Posts
Humans do have special roles and institutions so that you can talk about something bad you might be doing or have done, and people in such roles might not contact authorities or even have an obligation to not contact authorities. Consider lawyers, priests, etc.

So I think this kind of naive utilitarianism on the part of Claude 4 is not necessary—it could be agentic, moral, and so on. It’s just the Anthropic has (pretty consistently at this point) decided what kind of an entity it wants Claude to be, or not wished to think about the 2nd order effects.

1a3orn May 19, 2025, 1:40 AM
9 points
3
in reply to: ryan_greenblatt’s comment on: ryan_greenblatt’s Shortform
Good articulation.

(Similar to how humans learn.)

People also disagree greatly about how much humans tend towards integration rather than non-integration, and how much human skill comes from domain transfer. And I think some / a lot of the beliefs about artificial intelligence are downstream of these beliefs about the origins of biological intelligence and human expertise, i.e., in Yudkowsky / Ngo dialogues. (Object level: Both the LW-central and alternatives to the LW-central hypotheses seem insufficiently articulated; they operate as a background hypothesis too large to see rather than something explicitly noted, imo.)

1a3orn May 10, 2025, 2:34 AM
5 points
2
in reply to: ryan_greenblatt’s comment on: 1a3orn’s Shortform
I agree a bunch of different arrangements of memory / identity / “self” seem possible here, and lots of different kinds of syncing that might or might not preserve some kind of goals or cordination, depending on details.

I think this is interesting because some verrrry high level gut feelings / priors seem to tilt whether you think there’s going to be a lot of pressure towards merging or syncing.

Consider—recall Gwern’s notion of evolution as a backstop for intelligence; or the market as a backstop for corporate efficiency. If you buy something like Nick Land, where intelligence has immense difficulty standing by itself without natural selection atop it, and does not stand alone and supreme among optimizers—then there might be negative pressure indeed towards increasing consolidation of memory and self into unity, because this decreases the efficacy of the outer optimizer, which requires diversity. But if you buy Yudkowsky, where intelligence is supreme among optimizers and needs no other god or outer optimizer to stand upon, then you might have great positive pressure towards increasing consolidation of memory and self.

You could work out the above, of course, with more concrete references to pros and cons, from the perspective of various actors, rather than high level priors. But I’m somewhat unconvinced that something other than very high level priors is what are actually making up people’s minds :)

1a3orn May 10, 2025, 12:29 AM
8 points
3
in reply to: 1a3orn’s comment on: 1a3orn’s Shortform
Pinging @Daniel Kokotajlo because my model of him thinks he would want to be pinged, even though he’ll probably disagree reasonably strongly with the above.

1a3orn May 10, 2025, 12:23 AM
61 points
13
on: 1a3orn’s Shortform
Here’s what I’d consider some comparatively important high-level criticisms I have of AI-2027, that I am at least able to articulate reasonably well without too much effort.

1

At some point, I believe Agent-4, the AI created by OpenBrain starts to be causally connected over time. That is, unlike current AIs that are temporally ephemeral (my current programming instance of Claude has no memories with the instance I used a week ago) and causally unconnected between users (my instance cannot use memories from your instance), it is temporally continuous and causally connected. There is “one AI” in a way there is not with Claude 3.7 and o3 and so on.

Here are some obstacles to this happening:
1. This destroys reproducibility, because the programming ability you have a week ago is different than the ability two weeks ago and so on. But reliability / reproducibility is extremely desirable from a programming perspective, and a very mundane reliability / troubleshooting perspective (as well as from a elevated existential risk perspective). So I think it’s unlikely companies are going to do this.
2. Humans get worse at some tasks when they get better at others. RL finetuning of LLMs makes them better at some tasks while they get worse at others. Even adding more vectors to a vector DB can squeeze out another nearest neighbor and make it better at one task and worse at others. It would be a… really really hard task to ensure that a model doesn’t get worse, on some tasks.
3. No one’s working on anything like this. OpenAI has added memories, but it’s mostly kind of a toy and I know a lot of people have disabled it.
So I don’t think that’s going to happen. I expect AIs to remain “different.” The ability to restart AIs at will just has too many benefits, and continual learning seems too weakly developed, to do this. Even if we do have continual learning, I would expect more disconnection between models—i.e., maybe people will build up layers of skills in models in Dockerfile-esque layers, etc, which still falls short of being one single model.

2

I think that Xi Jingping’s actions are mostly unmotivated. To put it crudely, I feel like he’s acting like Daniel Kokotajlo with Chinese characteristics rather than himself. It’s hard to put my finger on one particular thing, but things that I recollect disagreeing with include things like:

(a) Nationalization of DeepCent was, as I recall, was vaguely motivated, but it was hinted that it was moved by lack of algorithmic progress. But the algorithmic-progress difference between Chinese models and US models at this point is like.… 0.5x. However, I expect that (a1) the difference between well run research labs and poorly run research labs can be several times larger than 0.5x, so this might come out in the wash and (a2) this amount of difference will be, to the state apparatus, essentially invisible. So that seems unmotivated.

(b) In general, it doesn’t actually seem to think about reasons why China would continue open-sourcing things. The supplementary materials don’t really motivate the closure of the algorithms; and I can’t recall anything in the narrative that asks why China is open sourcing things right now. But if you don’t know why it’s doing what it’s doing now, how can you tell why it’s doing what it’s doing in the future?

Here are some possible advantages to open sourcing things to China, from their perspective.

(b1) It decreases investment available to Western companies. That is, by releasing models near the frontier, open sourcing decreases future anticipated profit flow to Western companies, because they have a smaller delta of performance from cheaper models. This in turn means Western investment funds might be reluctant to invest in AI—which means less infrastructure will be built in the West. China, by contrast, and infamously, will just build infrastructure even if it doesn’t expect oversized profits to redound to any individual company.

(b2) Broad diffusion of AI all across the world can be considered a bet on complementarity of AI. That is, if it should be the case that the key to power is not just “AI alone” but “industrial power and AI” then broad and even diffusion of AI will redound greatly to China’s comparative benefit. (I find this objectively rather plausible, as well as something China might think.)

(b3) Finally, geopolitically, open sourcing may be a means of China furthering geopolitical goals. China has cast itself in recent propaganda as more rules-abiding than the US—which is, in fact, true in many respects. It wishes to cast the US as unilaterally imposing its will on others—which is again, actually true. The theory behind the export controls from the US, for instance, is explicitly justified by Dario and others as allowing the US to seize control over the lightcone; when the US has tried to impose import controls on others, it has provided to those excluded from power literally no recompense. So open sourcing has given China immense propaganda wins, by—in fact accurately, I believe—depicting the US as being a grabby and somewhat selfish entity. Continuing to do this may seem advantageous.

Anyhow—that’s what I have. I have other disagreements (i.e., speed; China might just not be behind; etc) but these are… what I felt like writing down right now.

1a3orn May 3, 2025, 11:03 PM
11 points
0
on: 1a3orn’s Shortform
What’s that part of planecrash where it talks about how most worlds are either all brute unthinking matter, or full of thinking superintelligence, and worlds that are like ours in-between are rare?

I tried both Gemini Research and Deep Research and they couldn’t find it, I don’t want to reread the whole thing.

1a3orn May 2, 2025, 7:11 PM
8 points
−2
on: RA x ControlAI video: What if AI just keeps getting smarter?

That’s why experts including Nobel prize winners and the founders of every top AI company have spoken out about the risk that AI might lead to human extinction.

I’m unaware of any statement to this effect from Deep Seek / Liang Wenfeng.

1a3orn Apr 28, 2025, 2:26 PM
4 points
0
in reply to: MichaelDickens’s comment on: “The Urgency of Interpretability” (Dario Amodei)

has no objection to taking unilateral actions that are unpopular in the x-risk community (and among the general public for that matter)

I have the courage of my convictions; you ignore the opinions of others; he takes reckless unilateral action.

1a3orn Apr 21, 2025, 2:23 PM
14 points
2
in reply to: aog’s comment on: aog’s Shortform
The quoted passage from Chris is actually a beautiful exposition of how Alasdair MacIntyre describes the feeling of encountering reasoning from an alternate “tradition of thought” to which one is an alien: the things that such a tradition says seems alternately obviously true or confusingly framed; the tradition focuses on things you think are unimportant; and the tradition seems apt to confuse people, particularly, of course, the noobs who haven’t learned the really important concepts yet.

MacIntyre talks a lot about how, although traditions of thought make tradition-independent truth claims, adjudicating between the claims of different traditions is typically hard-to-impossible because of different standards of rationality within them. Thus, here’s someone describing MacIntyre.

MacIntyre says [that conflicts between traditions] achieve resolution only when they move through at least two stages: one in which each tradition describes and judges its rivals only in its own terms, and a second in which it becomes possible to understand one’s rivals in their own terms and thus to find new reasons for changing one’s mind. Moving from the first stage to the second “requires a rare gift of empathy as well as of intellectual insight”

This is kinda MacIntyre’s way of talking about what LW talks about as inferential distances—or, as I now tend to think about it, about how pretraining on different corpora gives you very different ontology. I don’t think either of those are really sufficient, though?

I’m not really going anywhere with this comment, I just find MacIntyre’s perspective on this really illuminating, and something I broadly endorse.

I think LW has a pretty thick intellectual tradition at this point, with a pretty thick bundle of both explicit and implicit presuppositions, and it’s unsurprising that people within it just find even very well-informed critiques of it mostly irrelevant, just as it’s unsurprising that a lot of people critiqueing it don’t really seem to actually engage with it. (I do find it frustrating that people within the tradition seem to take this situation as a sign of the truth-speaking nature of LW though.)

1a3orn Apr 18, 2025, 1:31 PM
11 points
11
in reply to: Chris_Leong’s comment on: Chris_Leong’s Shortform

It is valuable for forecasting/evals orgs to be able to hire people with a diversity of viewpoints in order to counter bias.

This requires us to be more careful in terms of who gets hired in the first place.

I mean, good luck hiring people with a diversity of viewpoints who you’re also 100% sure will never do anything that you believe to be net negative. Like what does “diversity of viewpoints” even mean apart from that?

1a3orn Mar 31, 2025, 6:54 PM
5 points
2
in reply to: MichaelDickens’s comment on: Why do many people who care about AI Safety not clearly endorse PauseAI?
I’ve looked at a good amount of research on protest effectiveness. There are many observational studies showing that nonviolent protests are associated with preferred policy changes / voting patterns, and ~four natural experiments. If protests backfired for fairly minor reasons like “their website makes some hard-to-defend claims” (contrasted with major reasons like “the protesters are setting buildings on fire”), I think that would show up in the literature, and it doesn’t.

I’m not trying to get into the object level here. But people could both:
- Believe that making such hard-to-defend claims could backfire, disagreeing with those experiments that you point out or
- Believe that making such claims violates virtue-ethics-adjacent commitments to truth or
- Just not want to be associated, in an instinctive yuck kinda way, with people who make these kinds of dubious-to-them claims.
Of course people could be wrong about the above points. But if you believed these things, then they’d be intelligible reasons not to be associated with someone, and I think a lot of the claims PauseAI makes are such that a large number of people people would have these reactions.

1a3orn Mar 30, 2025, 6:55 PM
47 points
25
on: Why do many people who care about AI Safety not clearly endorse PauseAI?
Let’s look at the two horns of the dilemma, as you put it:
- Why do many people who want to pause AI not support the organization “PauseAI”?
- Why would the organization “PauseAI” not change itself so that people who want to pause AI can support it?
Well, here are some reasons someone who wants pause AI might not want to support the organization PauseAI:
- When you visit the website for PauseAI, you might find some very steep proposals for Pausing AI—such as requiring the “Granting [of] approval for new training runs of AI models above a certain size (e.g. 1 billion parameters)” or “Banning the publication of such algorithms” that improve AI performance or prohibiting the training of models that “are expected to exceed a score of 86% on the MMLU benchmark” unless their safety can be guaranteed. Implementing these measures would be really hard—a one-billion parameter model is quite small (I could train one); banning the publication of information on this stuff would be considered by many an infringement on freedom of speech; and there are tons of models now that do better than 86% on the MMLU and have done no harm.
So, if you think the specific measures proposed by them would limit an AI that even many pessimists would think is totally ok and almost risk-free, then you might not want to push for these proposals but for more lenient proposals that, because they are more lenient, might actually get implemented. To stop asking for the sky and actually get something concrete.
- If you look at the kind of claims that PauseAI makes in their risks page, you might believe that some of them seem exaggerated, or that PauseAI is simply throwing all the negative things they can find about AI into big list to make it seem bad. If you think that credibility is important to the effort to pause AI, then PauseAI might seem very careless about truth in a way that could backfire.
So, this is why people who want to pause AI might not want to support PauseAI.

And, well, why wouldn’t pause AI want to change?

Well—I’m gonna speak broadly—if you look at the history of PauseAI, they are marked by belief that the measures proposed by others are insufficient for Actually Stopping AI—for instance the kind of policy measures proposed by people working at AI companies isn’t enough; that the kind of measures proposed by people funded by OpenPhil are often not enough; and so on. Similarly, they often believe that people who are talking about these claims are nitpicking, and so on. (Citation needed.)

I don’t think this dynamic is rare. Many movements have “radical wings,” that more moderate organizations in the movement would characterize as having impracticable maximalist policy goals and careless epistemics. And the radical wings would of course criticize back that the “moderate wings” have insufficient or cowardly policy goals and epistemics optimized for respectability and not not truth. And the conflicts between them are intractable because people cannot move away from these prior beliefs about their interlocutors; in this respect the discourse around PauseAI seems unexceptionable and rather predictable.

1a3orn Mar 17, 2025, 7:00 PM
5 points
0
in reply to: Zygi Straznickas’s comment on: Why White-Box Redteaming Makes Me Feel Weird
Huh, interesting. This seems significant, though, no? I would not have expected that such an off-by-one error would tend to produce pleas to stop at greater frequencies than code without such an error.

Do you still have the git commit of the version that did this?

1a3orn Mar 17, 2025, 3:27 PM
6 points
7
on: Why White-Box Redteaming Makes Me Feel Weird

But one day I accidentally introduced a bug in the RL logic.

I’d really like to know what the bug here was.

1a3orn Mar 17, 2025, 3:18 PM
33 points
24
on: Why White-Box Redteaming Makes Me Feel Weird

Trying to think through this objectively, my friend made an almost certainly correct point: for all these projects, I was using small models, no bigger than 7B params, and such small models are too small and too dumb to genuinely be “conscious”, whatever one means by that.

Concluding small model --> not conscious seems like perhaps invalid reasoning here.

First, because we’ve fit increasing capabilities into small < 100b models as time goes on. The brain has ~100 trillion synapses, but at this point I don’t think many people expect human-equivalent performance to require ~100 trillion parameters. So I don’t see why I should expect moral patienthood to require it either. I’d expect it to be possible at much smaller sizes.

Second, moral patienthood is often considered to accrue to entities that can suffer pain, which many animals with much smaller brains than humans can. So, yeah.