15 year old trying not to get turned into a paperclip.
Studying alignment and researching hedging in LLMs v prompt imperativity.
fluxxrider
hi! great article! am a youtuber myself outside of the ai safety niche (https://youtube.com/@fluxxrider) was thinking of making a pivot
is it entirely in person? would love to connect! thanks tremendously
Anthropic has since reversed this decision, moving the work to Opus 4.8 like cyber or bio research. https://www.wired.com/story/anthropic-responds-to-backlash-on-claudes-secret-sabotage-on-ai-research/
I asked Fable 5 to recreate the AI Futures model with only a screenshot attached, and it created this: scary how far frontend and vision is progressing.
(versus the real thing)
I remember watching a movie about a Yeti a few years back and being laughed at when calling it an abominable snowman lol
Not necessarily, I doubt Mythos 5 is drastically different to the preview version- perhaps a bit of RL on top. Data’s too coarse regardless to draw conclusions IMO
@ben_r_hoffman: This sort of thing is my experience with most LLMs most of the time, but Opus 4.6 seemed slightly better than 4.7 or 4.8. I miss Sonnet 3.7 sometimes.
there’s a typo here- you copied Hoffman’s quote twice in the same paragraph
perhaps in prep for mythos release/cybersecurity? they said they’re working on guardrails for mythos perhaps 4.8 is a smoke test for this
it works with 4.6!
No luck! I think it’s the </thinking> tags as well
I was talking about how a benevolent ASI would administer immortality once and it flagged that somehow hahaha
Yeah, I think Anthropic is going slighty overcompensating for the Mythos scare and jailbreaks etc
I think its the wingdings
Oddly, whenever you share this link with Opus, safety filters flag the chat. Hm.
What are your timelines? Curious because there are rumors ‘GPT-6’ releases this year
Absolutely! I love your broad timelines idea on the AI Futures site (the one where you can change your probability distribution on what happens when) but it crashed when I tried doing it 🙁
I’m floored you actually responded haha. I’ll be working to get something running in the meantime, perhaps you could consider it then! Do check your inbox.
I’d be glad to help you out where needed!
Yes, exactly! On (2) in particular, the current system forces updates into 1/4ly posts onto the continuous nature of progress- something like a commit system would let you push an update (eg Opus impressing you) that changes your timelines without having to slot it in a 500-word Substack article and corroborate with Eli etc.
Fair point, but I think this actually kind of strengthens both my argument and yours; the fact is that progress doesn’t follow some smooth exponential. This is why I think it’s more optimum to update our timelines iteratively. Perhaps Mythos was a one-time leap in capability that won’t continue- this is great because it means we can update our priors and instead of bouncing back between extremes we can get a better picture of what our timelines look like.
I think Dario assumes wrongly the scope of intelligence itself; let’s say an agent can improve itself roughly from the level of AlexNet to your average coding agent nowadays (eg Opus 4.6). The gap between these is staggering; even if there is some upper limit who’s to say it’s close?
The human mind itself is much more efficient than 8 H100s; eventually a self-improving agent would top out at that (or become more computationally efficient than us) and by that point I’d argue you couldn’t tell the difference between “very superintelligent” and “wildly superintelligent”.

I’ve emailed you!