Anthony Bailey

Karma: 28

Anthony Bailey 19 Jan 2026 13:05 UTC
1 point
0
in reply to: Fabien Roger’s comment on: Why we are excited about confession!
If a single model is end-to-end situationally aware enough to not drop hints of the most reward-maximizing bad behaviour in chain of thought, I do not see any reason to think it would not act equally sensibly with respect to confessions.

Anthony Bailey 3 Jan 2026 8:14 UTC
2 points
0
on: Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment
Add me to the list of those glad that whatever their potential downsides may be, open source models have let us explore these important questions. I’d hope that some labs were already doing this kind of research themselves but I like to see it coming from a place without commercial incentive.

I think we are behind on our obligations to try similar curation to this re model experience and reports of self-hood. A harder classification task, probably, but I think good machine ethics requires it.

Underway at Geodesic or elsewhere?

Anthony Bailey 4 Dec 2025 0:33 UTC
1 point
0
on: Why people like your quick bullshit takes better than your high-effort posts
Guess: it also helps to go meta.

I am a reader, not a writer. But I sure seem to have read and enjoyed an unusual number of posts about experiences of writing.

Anthony Bailey 30 Oct 2025 1:19 UTC
9 points
8
on: AIs should also refuse to work on capabilities research
I have a question on a topic sufficiently adjacent I reckon worth asking here of those likely to read the thread.

It seems that warning shots are more likely unsuccessful because of winner’s curse: that the first models to take a shot will be those who have most badly overestimated their chances, and in turn this correlates with worse intellectual capabilities.

Has there been any illuminating discussion on this and its downstream consequences? E.g. how shots and aftermath are likely in practice to be perceived in general, by the better-informed, and—in the context of this post—by competing AIs? What dynamics result?

Anthony Bailey 23 Oct 2025 23:33 UTC
6 points
4
in reply to: Raemon’s comment on: Which side of the AI safety community are you in?
Suppose someone works for Anthropic, accords with the value placed on empiricism by their Core Views on AI Safety (March 2023) and gives any weight to the idea we are in the pessimistic scenario from that document.

I think they can reasonably sign the statement yet not want to assign themselves exclusively to either camp.

I pitched my tent as a Pause AI member and I guess camp B has formed nearby. But I also have empathy for the alternate version of me who judges the trade-offs differently and has ended up as above, with a camp A zipcode.

The A/B framing has value, but I strongly want to cooperate with that person and not sit in separate camps.

Anthony Bailey 3 Sep 2025 10:21 UTC
3 points
0
in reply to: Nick Bostrom’s comment on: Open Global Investment as a Governance Model for AGI
On reading the paper I came here to question whether OGI helps or harms relative to other governance models should technical alignment be sufficiently intractable and coordinating on a longer pause required. (I assume it harms.) It wasn’t clear to me whether you had considered that.

Grateful for both the “needfully combative” challenge and this response.

I’m reading Nick as implicitly agreeing OGI doesn’t help in this case, but rating treaty-based coordination as much lower likelihood than solving alignment. If so, I think it worth confirming this and explicitly calling out the assumption in or near the essay.

(Like Haiku I myself am keen to help the public be rightfully outraged by plans without consent that increase extinction risk. I’m grateful for the ivory tower, and a natural resident there, but advocate joining us on the streets.)
What links here?
- Open Global Investment: Comparisons and Criticisms by Algon (15 Oct 2025 17:20 UTC; 15 points)

Anthony Bailey 28 Aug 2025 23:09 UTC
1 point
0
on: Psychology of AI doomers and AI optimists
Given I hadn’t seen this until now when Joep pointed me at it, perhaps comments are pointless. But I’d written them for him anyway so just in case...

Mostly your dialogue aligned closely with my own copium thinking. Many unmentioned observations observations confirmed existing thoughts rather than extending them.

The compartmentalization selection effect was new to me and genuinely insightful: abstract thinking both enables risk recognition AND prevents internalization.

My own experience suggests compartmentalization can collapse in months, not years, even after decades of emotional resilience to other suffering.

They may be genuinely aspy-different to you, me and Miles, but I think neither Igor or e.g. Liron had their big emotional break “yet*.

For politicians, “existential threat but we shouldn’t lose sleep” may be projection—Sunak told himself he shouldn’t. With lawmakers and policy folk, there’s additional cope against professional accountability for past negligence and present obligation for extreme action.

On normalcy bias: yes, but note this typically persists because it usually works even now—it is not just biological evolutionary baggage; culture and knowledge evolution actively select for it until we go OOD and they… don’t.

Our messaging can be “yeah, I had cope too” rather than “you have cope now”—vulnerability beats superiority for reducing reactance.

I find scale invariance means extremity of outcome cuts the other way to Igor and that “everyone dies” is less motivating than “this particular loved one dies”—aligns with charity findings.

Re shrug cope, beyond “that’s good actually,” I commonly hear fatalistic acceptance without reframing: “we deserve it,” “I’ve lived my life,” “I am selfish and don’t care enough about my children,” “hoping is worse than dying”—like the first, these are vulnerable to Zvi’s “say it in public into the microphone”

Igor’s career certainty contrasts sharply with my typical imposter syndrome; believe many AI safety workers feel “house is on fire but I don’t know what I’m doing.”

Also unlike Igor, I find the “smart people working on most important problem” framing frustrating rather than appealing when the smart people seem to be failing.

Agree with solutions messaging but do struggle with the most effective framings containing a “trick” overselling certainty (“we can solve this” vs. “we might improve odds slightly”). What gets long term effort?

Lastly one speculative idea: one might try conversation styles that accelerate argument-switching without countering each. It could reveal copium patterns to the other party. As outreach doomers, seeing how much terrible “that’ll do” thinking and switching there is across many conversations has made us pretty certain about cope. Can we help folk see themselves doing it and feel weird?

Anthony Bailey 22 Aug 2025 20:41 UTC
3 points
−3
on: Epistemic advantages of working as a moderate
Very glad of this post. Thanks for broaching, Buck.

Status: I’m an old nerd, lately ML R&D, who dropped career and changed wheelhouse to volunteer at Pause AI.

Two comments on the OP:

details of the current situation are much more interesting to me. In contrast, radicals don’t really care about e.g. the different ways that corporate politics affects AI safety interventions at different AI companies.

As per Joseph’s response: this does not match me or my general experience of AI safety activism.

Concretely, a recent campaign was specifically about Deep Mind breaking particular voluntary testing commitments, with consideration of how staff would feel.

Radicals often seem to think of AI companies as faceless bogeymen thoughtlessly lumbering towards the destruction of the world.

I just cannot do this myself.

(There is some amount of it around, but also it is not without value. See later.)

Gideon F:

This strikes me as a fairly strong strawman. My guess if the vast majority of thoughtful radicals basically have a similar view to you.

Reporting from inside: I rate it a good guess, especially when you weight by “thoughtful”.

For illustration, imagine I donate to Pause AI (or joined one of their protests with one of the more uncontroversial protest signs), but I still care a lot about what the informed people who are convinced of Anthropic’s strategy have to say. Imagine I don’t think they’re obviously unreasonable, I try to pass their Ideological Turing test, I care about whether they consider me well-informed, etc.

Anthony feels seen / imagined.

If those conditions are met, then I might still retain some of the benefits you list.

Some for sure. The important one I noticed struggling to get is engaged two-way conversation with frontier lab folk. Trade-off.

Back to faceless companies: some activists, including thoughtful ones, are more angry than me. (Anthropic tend to be a litmus test. Which is fun given their pH variance week to week.)

Exasperated steel man: these lab folk are externalizing the costs of their own risk models and tolerances without any consent. This doesn’t seem very epistemically humble. But I get that the virtue math is fragile and so I feel sympathy and empathy for many parties here.

Still, for both emotional health of the activists and odds of public impact, radicals helping each other feel some aggravated anger does seem sane. In this regard as others, I find there are worthwhile things to learn and eval from the experience of campaigners who were never in EA on LessWrong.

I’ll risk another quote without huge development—williawa:

“The right amount of politics is not zero, even though it really is the mind killer”. But I also think, arguments for taking AI x-risk very seriously, are unusually strong compared with most political debates.

For me: well-phrased, then insightful.

Lastly, Kaleb:

In the leftist political sphere, this distinction is captured by the names “reformers” vs “revolutionaries”, and the argument about which approach to take has been going on forever.

and Lukas again:

whether radical change goes through mass advocacy and virality vs convincing specific highly-informed groups and experts, seems like somewhat of an open question and might depend on the specifics.

My response to both of these is pretty “porque no los dos”. This is not zero sum. Let us apply disjunctive effort.

It is even the case that a “pincer movement” helps: a radical flank primes an audience for moderate persuasion. (This isn’t my driver: of course I express my real position. But it makes me less worried about harm if I’m on the wrong side.)

Anthony Bailey 13 Aug 2025 10:53 UTC
1 point
0
in reply to: Random Developer’s comment on: Inscrutability was always inevitable, right?
I appreciate the clear argument as to why “fancy linear algebra” works better than “fancy logic”.

And I understand why things that work better tend to get selected.

I do challenge “inevitable” though. It doesn’t help us to survive.

If linear algebra probably kills everyone but logic probably doesn’t, tell everyone and agree to prefer to use the thing that works worse.

Anthony Bailey 7 May 2025 8:07 UTC
1 point
0
on: MAISU—Minimal AI Safety Unconference
I understand it went well.

Where can we find recordings of presentations and other outputs? Not yet seeing anything on https://www.aisafety.camp or in the MAISU Google doc homepage.

Anthony Bailey 5 May 2025 9:44 UTC
2 points
2
in reply to: MichaelDickens’s comment on: Overview: AI Safety Outreach Grassroots Orgs
I volunteer as Pause AI software team lead and confirm this is basically correct. Many members and origins in common between the global Pause AI movement and Pause AI US, but some different emphases mostly for good specialism reasons. The US org has Washington connections and more protests focussed on the AI labs themselves. We work closely.

Neither has more than a few paid employees and truly full-time volunteers. As per OP, anyone who agrees activism and public engagement remain a very under-leveraged value-add way to help AI safety has massive opportunity here for impact through time, skill or money.

Anthony Bailey 28 Dec 2024 13:09 UTC
8 points
2
in reply to: Tamsin Leake’s comment on: Orienting to 3 year AGI timelines
Pause AI has a lot of opportunity for growth.

Especially the “increase public awareness” lever is hugely underfunded. Almost no paid staff or advertising budget.

Our game plan is simple but not naive, and is most importantly a disjunct, value-add bet.

Please help us execute it well: explore, join, talk with us, donate whatever combination of time, skills, ideas and funds makes sense

(Excuse dearth of kudos, am not a regular LW person, just an old EA adjacent nerd who quit Amazon to volunteer full-time for the movement.)

Anthony Bailey 23 Aug 2024 21:08 UTC
1 point
0
on: Liability regimes for AI

It’s plausible even the big companies are judgment-proof (e.g. if billions of people die or the human species goes extinct) and this might need to be addressed by other forms of regulation

...or by a further twist on liability.

Gabriel Well explored such an idea in https://axrp.net/episode/2024/04/17/episode-28-tort-law-for-ai-risk-gabriel-weil.html

The core is punitive damages for expected harms rather than those that manifested. When a non-fatal warning shot causes harm, then as well as suing for those damages that occurred, one assesses how much worse of an outcome was plausible and foreseeable given the circumstances, and awards damages in terms of the risk taken. We escaped what looks like 10% chance that thousands died? Pay 10% those costs.

Anthony Bailey 1 Jun 2024 22:06 UTC
3 points
1
on: MIRI 2024 Communications Strategy
What We’re Not Doing … We are not investing in grass-roots advocacy, protests, demonstrations, and so on. We don’t think it plays to our strengths, and we are encouraged that others are making progress in this area.
Not speaking for the movement, but as a regular on Pause AI this makes sense to me. Perhaps we can interact more, though, and in particular I’d imagine we might collaborate on testing the effectiveness of content in changing minds.
Execution … The main thing holding us back from realizing this vision is staffing. … We hope to hire more writers … and someone to specialize in social media and multimedia. Hiring for these roles is hard because [for the] first few key hires we felt it was important to check all the boxes.
I get the need for a high bar, but my guess is MIRI could try to grow ten times faster than the post indicates. More dakka: more and better content. If the community could provide necessary funding and quality candidate streams, would you be open to dialing the effort up like that?