Against Moloch

Karma: 234

I write about rationality, coordination, and AI. I’m particularly interested in the coordination challenges associated with AI safety.

Against Moloch 18 Apr 2026 16:54 UTC
1 point
1
in reply to: Horosphere’s comment on: Don’t Cut Yourself on the Jagged Frontier
I don’t have strong opinions about the timelines in this case: my intuition is that many dangerous technologies would take longer to design and deploy that the amount of time the gap would persist for. I think there’s non-zero risk from that gap, which wasn’t previously on my radar. And I suspect that risk is low relative to other risks like misalignment.

Against Moloch 11 Apr 2026 15:50 UTC
4 points
0
in reply to: 152334H’s comment on: Quick Thoughts About Mythos
Thank you for the correction—I’d totally forgotten about OpenAI’s Trusted Access program. Updating the post accordingly.

Against Moloch 9 Apr 2026 18:44 UTC
1 point
0
in reply to: Richard_Kennaway’s comment on: Writing With Robots
Personally, I’m using the Claude app, running in the Cowork tab. I like that because I can give Claude access to a large set of documents including my writing and the style guide all at once. Claude Code would also work, but I prefer Cowork for simple text-oriented tasks.

I believe you can also add a file from the claude.ai interface, which might be slightly more convenient than copy & pasting (not sure if that’s available at every tier).

Against Moloch 28 Mar 2026 15:33 UTC
3 points
0
in reply to: Roman Malov’s comment on: Roman Malov’s Shortform
My biggest take is that this supports a recent shift I’ve had in my thinking: in the short run, I now worry more about AI causing severe disruption in cybersecurity, and somewhat less about biorisk. (Obviously, biorisk remains the more catastrophic danger, and we aren’t far from the point where that becomes critical).

Recent SOTA models including Opus 4.6 are right on the cusp of being able to cause major cyber disruptions, and it sounds like Mythos / Capybara is well into dangerous territory.

Against Moloch 25 Mar 2026 0:28 UTC
1 point
0
in reply to: Gavin Runeblade’s comment on: Monday AI Radar #18
I’d guess that’s pretty hard since the distillation attacks are intentionally spread across many accounts and conversations, and mixed in with innocuous requests. But on the other hand, the models are getting quite good at noticing subtle tells that humans would miss.

I suppose another question is: if you’re a big lab and you detect a distillation attack, do you block it or do you quietly feed it poisoned responses?

Against Moloch 19 Mar 2026 18:48 UTC
3 points
2
in reply to: Ariel Cheng’s comment on: Contra Anil Seth on AI Consciousness
If you have time, I’d love to read that.

Against Moloch 19 Mar 2026 18:01 UTC
1 point
0
in reply to: derelict5432’s comment on: Contra Anil Seth on AI Consciousness
Agreed. I feel like there’s an argument to be made about how consciousness is similar in nature to thinking, and thinking seems computational, so… but I haven’t seen a really compelling version of that anywhere.

Against Moloch 17 Mar 2026 18:00 UTC
4 points
0
on: Types of Handoff to AIs
This is great and I’d love to see it go further.

I wonder if there’s a component of decision handoff that could be characterized as “epistemic handoff”? If the president is making all his own decisions, but basing them on briefings and analyses provided by Agent 4, that starts to feel a lot like decision handoff in disguise.

Against Moloch 15 Feb 2026 23:18 UTC
7 points
1
on: Ads, Incentives, and Destiny
True story. We had family over for brunch today, and one person wanted help with Claude: he’d paid for a year’s subscription, but it wasn’t working. A few minutes was enough to diagnose what had happened:
- He went to google.com and searched for Claude
- The top link was called Claude, and described itself as the world’s best AI model
- He followed the link and paid $60 for a year’s subscription to some random nonsense

Against Moloch 24 Jan 2026 22:25 UTC
5 points
3
on: IABIED Book Review: Core Arguments and Counterarguments
Thank you for this—I think it does a great job of its objective.

Reading this reinforces my sense that while plenty of people have put forth some thoughtful and insightful disagreements with IABIED, there’s no comprehensive counter-argument that has anywhere near the level of polish and presentation as IABIED itself.

Against Moloch 11 Jun 2025 16:23 UTC
3 points
1
on: When is it important that open-weight models aren’t released? My thoughts on the benefits and dangers of open-weight models in response to developments in CBRN capabilities.
All of this seems solid, but it seems to me there are two additional considerations that push in the opposite direction:
1. COVID and its aftermath seem to suggest that pandemics make society dumber and more reactive. I wonder if a surge in bioterror would reduce humanity’s decision-making capability at a critical time?
2. Releasing highly capable open weights models would seem to increase existential risk by bringing near-SOTA capabilities to more actors. (For example, North Korea is probably incapable of building a frontier model from scratch, but might be able to perform large-scale fine-tuning to obtain a variety of problematic capabilities including but not limited to CBRN development).

Against Moloch 6 Jun 2025 15:35 UTC
1 point
0
on: Self-Coordinated Deception in Current AI Models
Interesting work! I wonder if a more successful way to uncover this kind of deception would be to iterate on what questions you ask each instance you interrogate?
As a simple example, if instance A tells you it needs an extension because it was helping its brother all evening, you might begin by telling instance B that extensions are fine but asking why it didn’t ask for one sooner, in the hope of shifting the focal point.
More realistically, if you think the model is sandbagging about interpretability results, you might give different instances somewhat different interpretability assignments, in the hope of exposing contradictory performance patterns.
This game quickly gets very complex, and it’s more or less axiomatic that humans lose complex games when playing against superhuman AI. But perhaps there’s a window where these techniques might be useful?

Against Moloch 27 May 2025 15:05 UTC
1 point
0
in reply to: habryka’s comment on: New scorecard evaluating AI companies on safety
I definitely find the presentation useful. In particular, the ability to drill down on each block is great (though it took me a moment to figure out how that worked).