I write about rationality, coordination, and AI. I’m particularly interested in the coordination challenges associated with AI safety.
Against Moloch
Monday AI Radar #24
Monday AI Radar #23
Monday AI Radar #22
Who I Follow
I don’t have strong opinions about the timelines in this case: my intuition is that many dangerous technologies would take longer to design and deploy that the amount of time the gap would persist for. I think there’s non-zero risk from that gap, which wasn’t previously on my radar. And I suspect that risk is low relative to other risks like misalignment.
Don’t Cut Yourself on the Jagged Frontier
Monday AI Radar #21
Thank you for the correction—I’d totally forgotten about OpenAI’s Trusted Access program. Updating the post accordingly.
Quick Thoughts About Mythos
Foundational Beliefs
Personally, I’m using the Claude app, running in the Cowork tab. I like that because I can give Claude access to a large set of documents including my writing and the style guide all at once. Claude Code would also work, but I prefer Cowork for simple text-oriented tasks.
I believe you can also add a file from the claude.ai interface, which might be slightly more convenient than copy & pasting (not sure if that’s available at every tier).
Writing With Robots
Monday AI Radar #19
My biggest take is that this supports a recent shift I’ve had in my thinking: in the short run, I now worry more about AI causing severe disruption in cybersecurity, and somewhat less about biorisk. (Obviously, biorisk remains the more catastrophic danger, and we aren’t far from the point where that becomes critical).
Recent SOTA models including Opus 4.6 are right on the cusp of being able to cause major cyber disruptions, and it sounds like Mythos / Capybara is well into dangerous territory.
I’d guess that’s pretty hard since the distillation attacks are intentionally spread across many accounts and conversations, and mixed in with innocuous requests. But on the other hand, the models are getting quite good at noticing subtle tells that humans would miss.
I suppose another question is: if you’re a big lab and you detect a distillation attack, do you block it or do you quietly feed it poisoned responses?
Monday AI Radar #18
If you have time, I’d love to read that.
Agreed. I feel like there’s an argument to be made about how consciousness is similar in nature to thinking, and thinking seems computational, so… but I haven’t seen a really compelling version of that anywhere.
Good catch—I think you’re right about the total pool.
Thinking about it further, I’m very unsure about three things:
How soon will new robots be good enough to meaningfully replace human workers?
Will that capability retrofit into some older robots because it’s largely software?
Will early human-comparable robots replace humans, augment them, or do entirely new jobs?